lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <yo...@apache.org>
Subject Re: resin and UTF-8 in URLs
Date Thu, 01 Feb 2007 20:26:50 GMT
On 2/1/07, Chris Hostetter <hossman_lucene@fucit.org> wrote:
>
> : > should we add:
> : >  request.setCharacterEncoding ("utf-8")
> : > to GET requests in StandardRequestParser?
> :
> : Perhaps.  I wonder if there's any performance impact, and if it fixes
> : Tomcat's default of latin1 too.
>
> see my comments in the related thread about POST...
>
> http://www.nabble.com/charset-in-POST-from-browser-tf3153057.html#a8744560
>
> ...my reading of the servlet spec was that request.setCharacterEncoding
> only impacted request *body* data, not the URL.

Yeah, hence I wouldn't do it if it only fixed resin, but if it fixed
tomcat too, it would save a lot of people headaches

> According to the javadocs for it, using it also means that if the client
> is well behaved and *does* set a charset in the Content-Type it will be
> ignored.

Content-Type for a GET?

> Solr users should be able to pick their encoding as much as possible -- so
> we definitely shouldnt' do anything that overrides the charset specified
> in the request (if there is one)

Sure.

> but we also shoudn't hardcode UTF-8
> anywhere if possible ... the default charset should come from some config
> -- either the solrconfig or the servlet containers config.

The problem is that one needs to be an expert to figure all this crap out.

Defaulting to UTF-8 in a url-encoded POST (where browsers refuse to
add charset) seems like a good default, and one that will increase
interop and prevent people from getting backed into a corner later.

-Yonik

Mime
View raw message