lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rajani Maski <rajinima...@gmail.com>
Subject Re: Trouble handling Unit symbol
Date Mon, 02 Apr 2012 04:48:32 GMT
Thank you for the reply.



On Sat, Mar 31, 2012 at 3:38 AM, Chris Hostetter
<hossman_lucene@fucit.org>wrote:

>
> : We have data having such symbols like :  ต
> : Indexed data has  -    Dose:"0 ตL"
> : Now , when  it is searched as  - Dose:"0 ตL"
>        ...
> : Query Q value observed  : <str name="q">S257:"0 ยตL/injection"</str>
>
> First off: your "when searched as" example does not match up to your
> "Query Q" observed value (ie: field queries, extra "/injection" text at
> the end) suggesting that you maybe cut/paste something you didn't mean to
> -- so take the rest of this advice with a grain of salt.
>
> If i ignore your "when it is searched as" exampleand focus entirely on
> what you say you've indexed the data as, and the Q value you are sing (in
> what looks like the echoParams output) then the first thing that jumps out
> at me is that it looks like your servlet container (or perhaps your web
> browser if that's where you tested this) is not dealing with the unicode
> correctly -- because allthough i see a "ต" in the first three lines i
> quoted above (UTF8: 0xC2 0xB5) in your value observed i'm seeing it
> preceeded by a "ย" (UTF8: 0xC3 0x82) ... suggesting that perhaps the "ต"
> did not get URL encoded properly when the request was made to your servlet
> container?
>
> In particular, you might want to take a look at...
>
>
> https://wiki.apache.org/solr/FAQ#Why_don.27t_International_Characters_Work.3F
> http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config
> The example/exampledocs/test_utf8.sh script included with solr
>
>
>
>
> -Hoss

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message