lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tatu Saloranta <t...@hypermall.net>
Subject Re: '-' character not interpreted correctly in field names
Date Mon, 03 Feb 2003 15:37:18 GMT
On Monday 03 February 2003 07:19, Terry Steichen wrote:
> I believe that the tokenizer treats a dash as a token separator.  Hence,
> the only way, as I recall, to eliminate this behavior is to modify
> QueryParser.jj so it doesn't do this.  However, doing this can cause some
> other problems, like hyphenated words at a line break and the like.

It might be enough to just replace analyzer passed in to QueryParser
to do this? This is the case if QueryParser only handles modifiers outside
terms, and terms are passed to analyzer.
I think this is the case (QueryParser does  call the analyzer in couple of 
places, and one word may actually expand to a phrase or vice versa)?

Still, it seems like using a hyphen as separator shouldn't necessarily cause 
big problems when indexer does the same; queries against "2 - 5" would be 
phrase queries for "2 5", which is still reasonably specific (and should 
match the content).

On the other hand, simple analyzer and standard analyzer have pretty different 
tokenization rules, so it's important to make sure same analyzer is used for 
both indexing and searching (that mismatch can prevent matches easily).

-+ Tatu +-



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message