lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <yo...@lucidimagination.com>
Subject Re: which unicode version is supported with lucene
Date Fri, 25 Feb 2011 14:16:59 GMT
On Fri, Feb 25, 2011 at 9:09 AM, Bernd Fehling
<bernd.fehling@uni-bielefeld.de> wrote:
> Hi Yonik,
>
> good point, yes we are using Jetty.
> Do you know if Tomcat has this limitation?

Tomcat's defaults are worse - you need to configure it to use UTF-8 by
default for URLs.
Once you do, it passes all those tests (last I checked).  Those tests
are really about UTF-8 working in GET/POST query arguments.  Solr may
still be able to handle indexing and returning full UTF-8, but you
wouldn't be able to query for it w/o using surrogates if you're using
Jetty.

It would be good to test though - does anyone know how to add a char
above the BMP to utf8-example.xml?

-Yonik
http://lucidimagination.com


> Regards,
> Bernd
>
> Am 25.02.2011 14:54, schrieb Yonik Seeley:
>> On Fri, Feb 25, 2011 at 8:48 AM, Bernd Fehling
>> <bernd.fehling@uni-bielefeld.de> wrote:
>>> So Solr trunk should already handle Unicode above BMP for field type string?
>>> Strange...
>>
>> One issue is that jetty doesn't support UTF-8 beyond the BMP:
>>
>> /opt/code/lusolr/solr/example/exampledocs$ ./test_utf8.sh
>> Solr server is up.
>> HTTP GET is accepting UTF-8
>> HTTP POST is accepting UTF-8
>> HTTP POST defaults to UTF-8
>> ERROR: HTTP GET is not accepting UTF-8 beyond the basic multilingual plane
>> ERROR: HTTP POST is not accepting UTF-8 beyond the basic multilingual plane
>> ERROR: HTTP POST + URL params is not accepting UTF-8 beyond the basic
>> multilingual plane
>>
>> -Yonik
>> http://lucidimagination.com
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message