lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lance Norskog" <goks...@gmail.com>
Subject RE: Searching combined English-Japanese index
Date Mon, 01 Oct 2007 19:11:41 GMT
Some servlet containers don't do UTF-8 out of the box. There is information
about this on the wiki. 

-----Original Message-----
From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik Seeley
Sent: Monday, October 01, 2007 9:45 AM
To: solr-user@lucene.apache.org
Subject: Re: Searching combined English-Japanese index

On 10/1/07, Maximilian Hütter <mh@blue-elephant-systems.com> wrote:
> Yonik Seeley schrieb:
> > On 10/1/07, Maximilian Hütter <mh@blue-elephant-systems.com> wrote:
> >> When I search using an English term, I get results but the Japanese 
> >> is not encoded correctly in the response. (although it is UTF-8 
> >> encoded)
> >
> > One quick thing to try is the python writer (wt=python) to see the 
> > actual unicode values of what you are getting back (since the python 
> > writer automatically escapes non-ascii).  That can help rule out 
> > incorrect charset handling by clients.
> >
> > -Yonik
> >
> Thanks for the tip, it turns out that the unicode values are wrong... 
> I mean the browser displays correctly what is send. But I don't know 
> how solr gets these values.

OK, so they never got into the index correctly.
The most likely explanation is that the charset wasn't set correctly when
the update message was sent to Solr.

-Yonik


Mime
View raw message