lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <>
Subject Re: resin and UTF-8 in URLs
Date Fri, 02 Feb 2007 16:42:48 GMT
On 2/1/07, Chris Hostetter <> wrote:
> ...the only real question in my mind is what to do if user supplied data
> has *NO* charset information of any kind ... for XML the spec seems very
> clear that in that case you test for UTF-8 or UTF-16 ... but for arbitrary
> streams of character data in other formats (CSV, JSON, etc...) it seems
> like trysting the servlet container to tell us the default encoding is the
> right way to go.

For XML, I think trusting the XML parser, and not the servlet
container is a better way to go.
That means handing the XML parser an InputStream instead of a Reader.

There *is* one place I think we should use UTF-8 when there isn't a
charset specified:
a POST with "Content-Type: application/x-www-form-urlencoded".

a) You can't get browsers to put a charset there.
b) Browsers by default encode the form data in the charset of the form.
c) We know more than the servlet container in this instance... we know
at least that
   our admin pages use UTF-8, and that a POST coming from them will be UTF-8.


View raw message