Why can’t HttpUrl just encode
Extra escaping is safe in most programming languages and document formats. For example, in HTML there’s no behavior consequence to replace a non-delimiter
" character with its escape sequence
". Or in JSON, it’s safe to replace the string
But URLs are different because URL encoding is semantic: you cannot encode a URL without changing it. This is weird!
Too Much Encoding
Suppose we’re looking up 100% on DuckDuckGo. Since the code point for
% is 0x25, that character encodes as
%25 and the whole URL is https://duckduckgo.com/?q=100%25.
But what if we encode the already-encoded URL of that query? We would double-encode the
%2525 and end up searching for
100%25. Yuck: https://duckduckgo.com/?q=100%2525.
Too Little Encoding
Next we’ll search for #1 on Google. We’ll encode
%23 and get this URL: https://www.google.ca/search?q=%231.
What if we forget to encode the
# in the query? Since
# is used as