lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tricia Williams <pgwil...@student.cs.uwaterloo.ca>
Subject Re: Cyrillic characters
Date Wed, 19 Jul 2006 15:29:29 GMT
Hi Yonik,

    I was incorrect to describe it as _solr encoding_.  Hoss suggested that 
it might be a form error - I haven't checked this yet but it sound 
plausible.  What I called the _solr url encoding_ was the q= parameter 
translated into <I'm not sure what> encoding in the url.  As I mention in 
my ps this translated value is not the same as when I use IE to post the 
same form values.

    You mentioned in another earlier post that q=h%c3%e9 would find 
matching hits.  My experience shows that while the UTF-8 encoded query 
doesn't generate any exceptions, no results are matched.  However 
q=h%e9llo would find matching results (the result set I'd match in Luke). 
So assuming that I can fix the form encoding errors so that the characters 
are encoded as UTF-8, I believe that I would continue to return incorrect 
results.  Will cyrillic characters be treated any differently than the 
diacritic in your example?

    I have solr running in tomcat 5.5.17.

Thanks for all you help,
Tricia


On Tue, 18 Jul 2006, Yonik Seeley wrote:

> On 7/18/06, Tricia Williams <pgwillia@student.cs.uwaterloo.ca> wrote:
>>  My sample query is: ...... (the english word _canada_
>> translated into russian) or
>> %D0%9A%D0%B0%D0%BD%D0%B0%D0%B4%D0%B0 (utf-8) or
>> %26%231050%3B%26%231072%3B%26%231085%3B%26%231072%3B%26%231076%3B%26%231072%3B
>> (solr url encoding)
>
> Hi Tricia,
> Could you clarify what you mean by "solr url encoding"?  Where do you see 
> this?
> The servlet container decodes URLs, and I'm not sure where in Solr
> that URLs are encoded.
>
> -Yonik
>

Mime
View raw message