lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vladimir Yuryev" <vyur...@rambler.ru>
Subject Re: ANN: Luke v. 0.5 released
Date Fri, 25 Jun 2004 07:14:59 GMT
On Thu, 24 Jun 2004 12:34:35 +0200
  Andrzej Bialecki <ab@getopt.org> wrote:
>Vladimir Yuryev wrote:
>
>> Hi Andrzej!
>> 
>> I am sorry for my English :-(
>> I with pleasure shall tell about the test and I shall try to state 
>> conditions of the test in detail.
>> 
>>>    I don't quite understand what you are saying... Do you suspect 
>>> there is a bug in Luke somewhere on the Search tab? If >that's the 
>>> case, please provide an example.
>> 
>> 
>> 
>> 1. Search was made on an index with coding Cp1251.
>> 2. Conditions of search:
>>      Analyzer to use for query parsing: 
>>org.apache.lucene.analysis.ru. 
>> RussianAnalyzer
>>      Default field is:contents
>> 
>>      2.1. Enter search expression here:высказался (the coding 
>>windows-1251)
>>             Result: No Results      2.2. Enter search expression 
>> here:высказал* (the coding windows-1251)
>>             Result: 1 doc (s), url: 
>> http://www.agnuz.info/result.php?year=2004&mounth1=March&day=26&files=v02.txt&print=news

>
>Time to refresh my russian... :-) Ok, the problem seems to be in the 
>RussianAnalyzer - it uses RussianLetterTokenizer, which filters out 
>anything which is a non-letter - I'm afraid it filters out also the 
>wildcard at the end. Not only that, it then passes the tokens through 
>a RussianStemmer, which further mutilates the tokens.
>
>Please try the "Parsed query view" on the "Search" tab to see what is 
>the result of your query, or paste your query into the text area on 
>the AnalyzerTool plugin ("Plugins"), and see what tokens you get 
>using RussianAnalyzer.
>
>I just did it, and the result for "высказал*" was "высказа" - clearly 
>not what you wanted.
>
>-- 
>Best regards,
>Andrzej Bialecki
>
>-------------------------------------------------
>Software Architect, System Integration Specialist
>CEN/ISSS EC Workshop, ECIMF project chair
>EU FP6 E-Commerce Expert/Evaluator
>-------------------------------------------------
>FreeBSD developer (http://www.freebsd.org)
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>

Hi Andrzej!

Well.
To the address: 
"http://www.agnuz.info/result.php?year=2004&mounth1=March&day=26&files=v02.txt&print=news"

there is a full text in which I searched for a phrase "...Pontiff has 
expressed importance...", in russian "Понтифик высказался о важности".


>Please try the "Parsed query view" on the "Search" tab to see what is the result of your
query

In a bookmark "Search" the phrase has not been found. The problem was 
(for some reason?!) in the second and third words? Search by separate 
words (simple terms) has found out a problem in these last two words. 
And so, for "Analyzer to use for query parsing: ": 
org.apache.lucene.analysis.ru.RussianAnalyzer,
"Entry search expression here": [texts in coding Cp1251] -

1. "Entry search expression here ":"Понтифик высказался о важности".
     "Parsed query view": contents:"понтифик высказа важност".
- No Results

2. "Entry search expression here":Понтифик
     "Parsed query view": contents:понтифик 
- 2 doc (s)
URLs:
"http: // www.agnuz.info/result.php? 
year=2004&mounth1=March&day=26&files=v01.txt&print=news" 
"http: // www.agnuz.info/result.php? 
year=2004&mounth1=March&day=26&files=v02.txt&print=news" 

3. "Entry search expression here":высказался
     "Parsed query view": contents:высказа 
- No Results

4. "Entry search expression here":важности
     "Parsed query view": contents:важност
- No Results

5. "Entry search expression here":Понтифик высказался о важности.
     "Parsed query view": contents:понтифик contents:высказа 
contents:важност.
  - 2 doc (s)-> the same documents as point 2.

>.., or paste your query into the text area on the AnalyzerTool plugin ("Plugins"), and
see what tokens you get using RussianAnalyzer.

In a tab "Plugins" in a field "Text to be analyzed" I have tested the 
same three words as a phrase - "Понтифик высказался о важности".
As a 
result of the analysis in a field "Tokens found" three have been shown 
stemms - "понтифик", "высказа" and "важност". Actions - " hilite->
" 
has given positive results by all three words. (Similar a problem not 
in filters?):-)

Best regards,
Vladimir.

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message