lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lukas Zapletal <l...@root.cz>
Subject Re: Escaping bug \( and ? or *
Date Sun, 02 Feb 2003 16:44:39 GMT
Tatu Saloranta wrote:

>I think the problem is that the analyzer you used for indexer strips out 
>parenthesis. So, text actually indexed would look something like:
>"test 1 test 2" (assuming 'and' is a stop word removed). Thus there's
>no token matching term "(1)" or "(2)".
>Same goes for most other punctuation characters, they are routinely
>stripped by analyser, as they usually are not very useful for searching.
>
>To make it work the way you want, you need to modify analyzer to 
>included parentesis, perhaps so that they are included only if
>they contain just single alpha-numeric token (otherwise
>"(1 and 2)" would be tokenized to "(1" and "2)" which is probably
>not what you want?
>
Well I think this is not true.

I use this analzyer either for queries. So the parenthesis and other 
puncatuation are also stripped when I make query.

This is MAYBE a bug. PLEASE TEST THE CODE.

-- 
Lukas Zapletal      [lzap@root.cz]
http://www.tanecni-olomouc.cz/lzap




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message