lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tatu Saloranta <>
Subject Re: wildcard searching issue
Date Fri, 07 Feb 2003 04:14:35 GMT
On Thursday 06 February 2003 11:12, D.L.B. wrote:
> Hi eveyone,
> We've uncovered some perhaps undesirable behavior when doing a wildcard
> search against a stemmed index.  These issues may be part of the problems
> referenced in the thread "Too few search results".
> The problem is that, for prefix and wildcard queries, the query string is
> not sent to the analyzer for tokenization (and stemming).  This can result
> in expected hits not being returned.  For example:
> I've coded a fix to this in QueryParser.jj.  In the cases like the above,
> take the word without the '*' and send it to the analyzer.  If a single
> token is returned, use it to create the PrefixQuery or WildcardQuery.  So,
> if you search for "pipette*", send "pipette" to the analyzer, get "pipet"
> back, create the PrefixQuery using "pipet", not "pipette".
> If y'all feel this is an issue that needs fixin', let me know and I'll post
> my fix.

Perhaps a good fix would be to improve QueryParser to accept another analyzer 
(in addition to default one), analyzer that will be used for tokenizing wild 
card / prefix terms? Often simple default analyzer (for example one that just 
lower cases input) should do nicely.
This could be done by adding a method for setting such analyzer; default would 
be to not use any analyzer (to keep backwards compatibility)?

I think the separation between "high-level" query parsing (ie. handling 
modifiers, +/-,?, field prefix, AND, OR) and "low-level" is a really good 
thing to have, and it'd be good to if that could work similarly with prefix 
queries too.

Just my 2c.,

-+ Tatu +-

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message