lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Underwood <>
Subject Re: resin and UTF-8 in URLs
Date Thu, 01 Feb 2007 22:28:47 GMT
Let's not make this complicated for situations that we've never
seen in practice. Java is a Unicode language and always has been.
Anyone running a Java system with a Shift-JIS default should already
know the pitfalls, and know them much better than us (and I know a
lot about Shift-JIS).

The URI spec says UTF-8, so we can be compliant and tell people
to fix their code. If they need to add specific hacks for their
broken software, that is OK. We don't need generic design features
for a few broken clients.

RFC 3896 has been out for two years now. That is long enough for
decently-maintained software to get it right.


On 2/1/07 2:14 PM, "Chris Hostetter" <> wrote:

> : If we can do something small that makes the most normal cases work
> : even if the container is not configured, that seems good.
> but how do we know the user wants what we consider a "normal cases" to
> work? ... if every servlet container lets you configure your default
> charset differently, we have no easy way to tell if/when they've
> configured the default properly, to know if we should override it.
> If someone does everything in Shift-JIS, and sets up their servlet
> container with Shift-JIS as their default, and installs solr -- i don't
> want them to think Solr sucks because there is a default in Solr they
> don't know about (or know how to disable) that assumes UTF-8.
> On the other hand: if someone really hasn't thought about charsets at all,
> then it doesn't seem that bad to use whatever default their servlet
> container says to use -- as I understand it some containers (tomcat
> included) pick their default based on the JVMs
> configuration (i assume from the "user.language" sysproperty) ... that
> certainly seems like a better default then for us ot asume UTF-8 -- even
> if it is "latin1" for "en", because most novice users are probably okay
> with latin1 ... if you're starting to worry about more complex characters
> that aren't in the default charset your servlet container picks for you,
> then reading a little documentation is a good idea.
> : At the very lease, we should change the examples in:
> : etc
> oh absolutely.
> -Hoss

View raw message