lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <yo...@lucidimagination.com>
Subject Re: which unicode version is supported with lucene
Date Fri, 25 Feb 2011 15:04:59 GMT
On Fri, Feb 25, 2011 at 9:31 AM, Robert Muir <rcmuir@gmail.com> wrote:
> Then i searched on 'range' via the admin gui to retrieve this
> document, and chrome blew up with "This page contains the following
> errors: error on line 17 at column 306: Encoding error"

I got an error in firefox too.
I added the following example (commented out for now):
    <field name="features">Outside the BMP:𐌈 codepoint=10308, a
circle with an x inside. UTF8=f0908c88 UTF16=d800 df08</field>

I can verify it got into Solr OK by querying with python format (which
escapes everything outside the ascii range for each 16 bit char):
http://localhost:8983/solr/select?q=BMP&wt=python&indent=true

[...]
          u'Outside the BMP:\ud800\udf08 codepoint=10308, a circle
with an x inside. UTF8=f0908c88 UTF16=d800 df08']}]

But firefox complains on XML output, and any other output like JSON it
looks mangled.
My bet is Jetty's UTF8 encoding for the response also doesn't handle
the full range.

-Yonik
http://lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message