lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Taurat" <daniel.tau...@gaussvip.com>
Subject RE: jaspq: dashed numerical values tokenized differently
Date Wed, 03 Nov 2004 10:03:33 GMT


-----Original Message-----
From: Morus Walter [mailto:morus.walter@tanto.de] 
Sent: Dienstag, 2. November 2004 09:21
To: Lucene Users List
Subject: Re: jaspq: dashed numerical values tokenized differently 

>Daniel Taurat writes:
>> Hi,
>> I have just another stupid parser question:
>> There seems to be a special handling of the dash sign "-" different
from
>> Lucene 1.2 at least in Lucene 1.4.RC3
>> StandardAnalyzer.
>> 
>> Examples (1.4RC3):
>> 
>> A document containing the string "dash-test" is matched by the
following
>> search expressions:
>> dash
>> test
>> dash*
>> dash-test
>> It is _not_ matched by the following search expressions:
>> dash-*
>> dash-t*
>> 
>> If the string after the dash consists of digits, the behavior is
>> different.
>> E.g., a document containing the string "dash-123" is matched by:
>> dash*
>> dash-*
>> dash-123
>> It is not matched by:
>> dash
>> 123
>> 
>> Question:
>> Is this, esp. the different behavior when parsing digits and
characters,
>> intentional and how can it be explained?
>> Regards,
>> 
>Query parser was changed to treat '-' within words as part of the word.
>Before that change a query 'dash-test' was parsed as 'dash AND NOT
test'.
>Now QP reads one word 'dash-test' which is analyzed. If the analyzer
>splits that to more than one token (standard analyzer does) a phrase
>query is created.
>The difference you see comes from standard analyzer which tokenizes
>dash-test dash-123 to tokens dash, test and dash-123.
>Prefix queries aren't analyzed.



So you say that dash-123 is a prefix query whereas dash-test is not?
I found also (with Luke) that dash-anystring123 is not tokenized as
well.
What exactly are the criteria for Lucene to decide what a prefix is and
what not?

Daniel 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message