lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vladimir Yuryev" <>
Subject Re: ANN: Luke v. 0.5 released
Date Fri, 25 Jun 2004 07:14:59 GMT
On Thu, 24 Jun 2004 12:34:35 +0200
  Andrzej Bialecki <> wrote:
>Vladimir Yuryev wrote:
>> Hi Andrzej!
>> I am sorry for my English :-(
>> I with pleasure shall tell about the test and I shall try to state 
>> conditions of the test in detail.
>>>    I don't quite understand what you are saying... Do you suspect 
>>> there is a bug in Luke somewhere on the Search tab? If >that's the 
>>> case, please provide an example.
>> 1. Search was made on an index with coding Cp1251.
>> 2. Conditions of search:
>>      Analyzer to use for query parsing: 
>> RussianAnalyzer
>>      Default field is:contents
>>      2.1. Enter search expression here:высказался (the coding 
>>             Result: No Results      2.2. Enter search expression 
>> here:высказал* (the coding windows-1251)
>>             Result: 1 doc (s), url: 

>Time to refresh my russian... :-) Ok, the problem seems to be in the 
>RussianAnalyzer - it uses RussianLetterTokenizer, which filters out 
>anything which is a non-letter - I'm afraid it filters out also the 
>wildcard at the end. Not only that, it then passes the tokens through 
>a RussianStemmer, which further mutilates the tokens.
>Please try the "Parsed query view" on the "Search" tab to see what is 
>the result of your query, or paste your query into the text area on 
>the AnalyzerTool plugin ("Plugins"), and see what tokens you get 
>using RussianAnalyzer.
>I just did it, and the result for "высказал*" was "высказа" - clearly 
>not what you wanted.
>Best regards,
>Andrzej Bialecki
>Software Architect, System Integration Specialist
>CEN/ISSS EC Workshop, ECIMF project chair
>EU FP6 E-Commerce Expert/Evaluator
>FreeBSD developer (
>To unsubscribe, e-mail:
>For additional commands, e-mail:

Hi Andrzej!

To the address: 

there is a full text in which I searched for a phrase "...Pontiff has 
expressed importance...", in russian "Понтифик высказался о важности".

>Please try the "Parsed query view" on the "Search" tab to see what is the result of your

In a bookmark "Search" the phrase has not been found. The problem was 
(for some reason?!) in the second and third words? Search by separate 
words (simple terms) has found out a problem in these last two words. 
And so, for "Analyzer to use for query parsing: ":,
"Entry search expression here": [texts in coding Cp1251] -

1. "Entry search expression here ":"Понтифик высказался о важности".
     "Parsed query view": contents:"понтифик высказа важност".
- No Results

2. "Entry search expression here":Понтифик
     "Parsed query view": contents:понтифик 
- 2 doc (s)
"http: // 
"http: // 

3. "Entry search expression here":высказался
     "Parsed query view": contents:высказа 
- No Results

4. "Entry search expression here":важности
     "Parsed query view": contents:важност
- No Results

5. "Entry search expression here":Понтифик высказался о важности.
     "Parsed query view": contents:понтифик contents:высказа 
  - 2 doc (s)-> the same documents as point 2.

>.., or paste your query into the text area on the AnalyzerTool plugin ("Plugins"), and
see what tokens you get using RussianAnalyzer.

In a tab "Plugins" in a field "Text to be analyzed" I have tested the 
same three words as a phrase - "Понтифик высказался о важности".
As a 
result of the analysis in a field "Tokens found" three have been shown 
stemms - "понтифик", "высказа" and "важност". Actions - " hilite->
has given positive results by all three words. (Similar a problem not 
in filters?):-)

Best regards,

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message