lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Morus Walter <morus.wal...@tanto.de>
Subject Re: Problem with search results
Date Wed, 03 Mar 2004 07:59:12 GMT
Otis Gospodnetic writes:
> And if you do not use QueryParser, then things work?
> If so, then this is likely caused by the fact that your Term contains a
> 'special' character, '-'.
> 
Actually I was going to suggest a fix for '-' within words in the
query parser.

The was a suggested fix, that changed both StandardAnalyzer and QueryParser,
which was rejected, I guess because of the StandardAnalyzer change.

Now I think this can be fixed in the query parser alone by simply allowing
'-' within words.
That is change
<#_TERM_CHAR: ( <_TERM_START_CHAR> | <_ESCAPED_CHAR> ) >
to
<#_TERM_CHAR: ( <_TERM_START_CHAR> | <_ESCAPED_CHAR> | "-" ) >

As a result, query parser will read '-' within words (such as tft-monitor
or Sysh1-1) as one word, which will be tokenized by the used analyzer
and end up in a term query or phrase query depending if it create one ore
more tokens.
So with StandardAnalyzer a query
tft-monitor would get a phrase query "tft monitor" and Sysh1-1 a term query
for "Sysh1-1". 
Searching tft-monitor as a phrase "tft monitor" is not exact but the best
aproximation possible once you indexed tft-monitor as tokens tft and monitor.

The effect of '-' not occuring within a word is not changed, so
tft -monitor will still search for 'tft AND NOT monitor'.

Is that a change that would be acceptable?
I didn't find the time to look at the regression tests though.

Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message