lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Taurat" <daniel.tau...@gaussvip.com>
Subject jaspq: dashed numerical values tokenized differently
Date Mon, 01 Nov 2004 15:29:35 GMT
Hi,
I have just another stupid parser question:
There seems to be a special handling of the dash sign "-" different from
Lucene 1.2 at least in Lucene 1.4.RC3
StandardAnalyzer.

Examples (1.4RC3):

A document containing the string "dash-test" is matched by the following
search expressions:
dash
test
dash*
dash-test
It is _not_ matched by the following search expressions:
dash-*
dash-t*

If the string after the dash consists of digits, the behavior is
different.
E.g., a document containing the string "dash-123" is matched by:
dash*
dash-*
dash-123
It is not matched by:
dash
123

Question:
Is this, esp. the different behavior when parsing digits and characters,
intentional and how can it be explained?
Regards,

Daniel





---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message