lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Alheiros <Daniel.Alhei...@bbc.co.uk>
Subject Re: Problems querying Russian content
Date Thu, 28 Jun 2007 16:51:05 GMT
Thanks.

Yes I will do it. 

So you may be the best person to talk about the Russian content indexing. :)
My indexing process follows:
    1. RussianTokenizer
    2. RussianLowerCaseFilter
    3. RussianStopFilter
    4. RussianStemFilter

Seems OK to me as I'm using the same structure used by the Lucene's
RussianAnalyzer... Do you think I can improve it somehow?

Regards,
Daniel



On 28.06.2007 17:37, "funtick@efendi.ca" <funtick@efendi.ca> wrote:

> Hi Danier,
> 
> Ensure that UTF-8 is everywhere... SOLR, WebServer, AppServer, HTTP
> Headers, etc.
> 
> And do not use  
> q=&#1041;&#1072;&#1084;&#1073;&#1072;&#1088;&#1073;&#1080;&#1072;
> &#1050;&#1080;&#1088;&#1082;&#1091;&#1076;&#1091;
> use this instead (encoded URL):
> q=%D0%91%D0%B0%D0%BC%D0%B1%D0%B0%D1%80%D0%B1%D0%B8%D0%B0+%D0%9A%D0%B8%D1%80%D0
> %BA%D1%83%D0%B4%D1%83
> 
> http://www.tokenizer.org is a search engine, SOLR powered... I need to
> add some large Internet shops to the crawler, from Russia...
> 
> Quoting Daniel Alheiros:
> 
>> Hi
>> 
>> I'm in trouble now about how to issue queries against Solr using in my "q"
>> parameter content in Russian (it applies to Chinese and Arabic as well).
>> 
>> The problem is I can't send any Russian special character in URL's because
>> they don't fit in ASCII domain, so I'm doing a POST to accomplish that.
>> 
>> My application gets the request and logs it (and the Russian characters
>> appear correctly on my logs) and then calls the Solr server and Solr is not
>> receiving it correctly... I can just see in the Solr log the special
>> characters as question marks...
>> 
>> Did anyone faced problems like that? My whole system is set to work in UTF-8
>> (browser, application servers).
>> 
>> Regards,
>> Daniel
>> 
>> 
>> http://www.bbc.co.uk/
>> This e-mail (and any attachments) is confidential and may contain
>> personal views which are not the views of the BBC unless
>> specifically stated.
>> If you have received it in error, please delete it from your system.
>> Do not use, copy or disclose the information in any way nor act in
>> reliance on it and notify the sender immediately.
>> Please note that the BBC monitors e-mails sent or received.
>> Further communication will signify your consent to this.
>> 
>> 
> 
> 


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal views which are
not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify
the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.
					

Mime
View raw message