lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sergiu gordea <gser...@ifit.uni-klu.ac.at>
Subject Re: jaspq: dashed numerical values tokenized differently
Date Tue, 02 Nov 2004 06:56:20 GMT
Daniel Taurat wrote:

>Hi,
>I have just another stupid parser question:
>There seems to be a special handling of the dash sign "-" different from
>Lucene 1.2 at least in Lucene 1.4.RC3
>StandardAnalyzer.
>  
>
 From the behaviour you describe I think that the dash sign is removed 
from the text by the analyzer.
This is quite correct because dash is used to separate two words. 
Without its elimination you won't be able to
get the "dash-test" in results if you search for: dash or/and test

I suggest you to use LUKE ... see contributors page in order to see what 
exactly you have in the index, then you will understand
why search is working like that.

 Sergiu

>Examples (1.4RC3):
>
>A document containing the string "dash-test" is matched by the following
>search expressions:
>dash
>test
>dash*
>dash-test
>It is _not_ matched by the following search expressions:
>dash-*
>dash-t*
>
>If the string after the dash consists of digits, the behavior is
>different.
>E.g., a document containing the string "dash-123" is matched by:
>dash*
>dash-*
>dash-123
>It is not matched by:
>dash
>123
>
>Question:
>Is this, esp. the different behavior when parsing digits and characters,
>intentional and how can it be explained?
>Regards,
>
>Daniel
>
>
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message