lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki>
Subject Re: ANN: Luke v. 0.5 released
Date Thu, 24 Jun 2004 10:34:35 GMT
Vladimir Yuryev wrote:

> Hi Andrzej!
> I am sorry for my English :-(
> I with pleasure shall tell about the test and I shall try to state 
> conditions of the test in detail.
>>    I don't quite understand what you are saying... Do you suspect 
>> there is a bug in Luke somewhere on the Search tab? If >that's the 
>> case, please provide an example.
> 1. Search was made on an index with coding Cp1251.
> 2. Conditions of search:
>      Analyzer to use for query parsing: 
> RussianAnalyzer
>      Default field is:contents
>      2.1. Enter search expression here:высказался (the coding windows-1251)
>             Result: No Results      2.2. Enter search expression 
> here:высказал* (the coding windows-1251)
>             Result: 1 doc (s), url: 

Time to refresh my russian... :-) Ok, the problem seems to be in the 
RussianAnalyzer - it uses RussianLetterTokenizer, which filters out 
anything which is a non-letter - I'm afraid it filters out also the 
wildcard at the end. Not only that, it then passes the tokens through a 
RussianStemmer, which further mutilates the tokens.

Please try the "Parsed query view" on the "Search" tab to see what is 
the result of your query, or paste your query into the text area on the 
AnalyzerTool plugin ("Plugins"), and see what tokens you get using 

I just did it, and the result for "высказал*" was "высказа" - clearly 
not what you wanted.

Best regards,
Andrzej Bialecki

Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
FreeBSD developer (

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message