lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tatu Saloranta <t...@hypermall.net>
Subject Re: Escaping bug \( and ? or *
Date Sat, 01 Feb 2003 20:47:57 GMT
On Friday 31 January 2003 13:27, Lukas Zapletal wrote:
> Hello all,
>
> Let`s have an indexed text "Test (1) and test (2)".
>
> Now search for: \(1\)
>
> Everything OK, so lets search for: \(?\)
>
> Nothing found! It`s same with \" and maybe other escaped characters.
>
> Is this a bug? Is it already solved in the CVS? If not, how can we fix it?

I think the problem is that the analyzer you used for indexer strips out 
parenthesis. So, text actually indexed would look something like:
"test 1 test 2" (assuming 'and' is a stop word removed). Thus there's
no token matching term "(1)" or "(2)".
Same goes for most other punctuation characters, they are routinely
stripped by analyser, as they usually are not very useful for searching.

To make it work the way you want, you need to modify analyzer to 
included parentesis, perhaps so that they are included only if
they contain just single alpha-numeric token (otherwise
"(1 and 2)" would be tokenized to "(1" and "2)" which is probably
not what you want?

-+ Tatu +-


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message