tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier (tomcat) ...@ice-sa.com>
Subject Re: URL-encoding and "#"
Date Fri, 13 Oct 2017 17:15:07 GMT
On 13.10.2017 18:17, Mark Thomas wrote:
> On 13/10/2017 17:09, James H. H. Lampert wrote:
>> Thanks to all of you who responded.
>>
>> I found a web page that explains it in ways that I can wrap my
>> 55-year-old brain around, and has an easy-to-read reference chart.
>>
>> https://perishablepress.com/stop-using-unsafe-characters-in-urls/
>>
>> Question: the problem first showed up on a web service that takes a
>> "bodyless" POST operation, and I assume it also applies to GET
>> operations, and to the URL portion of a POST with a body.
>>
>> But what about the body of a POST?
>
>  From an HTTP specification point of view, anything goes.

With respect, I believe that "anything goes" is a bit imprecise here.

See e.g. https://www.w3.org/TR/html401/interact/forms.html#h-17.13.4

There are 2 ways for a user agent to send the content of a HTTP POST :
1) with Content-type header = application/x-www-form-urlencoded
or
2) with Content-type header = multipart/form-data

and while it is true that in the case (2), any submitted key=value pair would be sent 
separately 'as is', this would not necessarily be so in case (1), because then all 
key=value pairs would be concatenated into one long string, in which the different 
key=value pairs would be separated by (unescaped) "&" signs.
(Apart from other required encodings, see the page above)
So if the client is not a browser, and "composes" itself the POST body before sending it,

and sends it with a Content-type (1), it had better encode the individual parameter pairs

as described, before concatenating them, because that is what the server would expect.

As an additional note, if it so happened that the data in the client could contain Unicode

text, do not forget that this is (still) not the standard in HTTP (and URI's, and thus 
query-string-like things), and make sure that you use the proper method to encode any 
printable characters which are not purely US-ASCII.  Again, browsers generally do this 
correctly, but custom clients not necessarily. (And a "custom client" in this case, could

even be a bit of javascript which is embedded in one of your own pages, but does its own 
calls to the server on the side).

I just recently got bitten by this, even in a quite recent browser, where some javascript

function was composing a POST to a server (using type (1) above), and was NOT doing it 
correctly, even though the page containing and calling this function was itself declared 
as Unicode/UTF-8.
(that was with (and I am too sorely tempted to add "of course" to resist it) some revision

of IE-11 - although other revisions of the same browser did not exhibit that same issue).

[...]


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message