lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Morus Walter <morus.wal...@gmx.de>
Subject Re: Problem with search results
Date Sat, 06 Mar 2004 08:32:31 GMT
Doug Cutting writes:
> Morus Walter wrote:
> > Now I think this can be fixed in the query parser alone by simply allowing
> > '-' within words.
> > That is change
> > <#_TERM_CHAR: ( <_TERM_START_CHAR> | <_ESCAPED_CHAR> ) >
> > to
> > <#_TERM_CHAR: ( <_TERM_START_CHAR> | <_ESCAPED_CHAR> | "-" ) >
> > 
> > As a result, query parser will read '-' within words (such as tft-monitor
> > or Sysh1-1) as one word, which will be tokenized by the used analyzer
> > and end up in a term query or phrase query depending if it create one ore
> > more tokens.
> 
> Other characters which are also candidates for this sort of treatment 
> include "/", "@", ".", "'", and "+".
> 
_TERM_START_CHAR is
| <#_TERM_START_CHAR: ( ~[ " ", "\t", "\n", "\r", "+", "-", "!", "(", ")", 
     ":", "^", "[", "]", "\"", "{", "}", "~", "*", "?" ]

so / @ . ' are already allowed in terms.
(:, ^, ~, * and ? cannot be added, parenthesis don't make sense.)
So I end up with
<#_TERM_CHAR: ( <_TERM_START_CHAR> | <_ESCAPED_CHAR> | "-" | "+" ) >

The regression tests show no error, so I entered that in bugzilla.

Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message