lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "D.L.B." <augustea...@rcn.com>
Subject wildcard searching issue
Date Thu, 06 Feb 2003 18:12:50 GMT
Hi eveyone,

We've uncovered some perhaps undesirable behavior when doing a wildcard search 
against a stemmed index.  These issues may be part of the problems referenced 
in the thread "Too few search results".

The problem is that, for prefix and wildcard queries, the query string is not 
sent to the analyzer for tokenization (and stemming).  This can result in 
expected hits not being returned.  For example:

"pipette" gets stemmed to "pipet".  a search on "pipette*" will not match 
against the documents with "pipette" in there, "pipette" having been stemmed 
at index time to "pipet".

cylinder gets stemmed to cylind.  a search on "*cylinder" will not match 
against "pipettecylinder", "pipetcylinder" having been stemmed at index time 
to "pipetcylind".

I've coded a fix to this in QueryParser.jj.  In the cases like the above, take 
the word without the '*' and send it to the analyzer.  If a single token is 
returned, use it to create the PrefixQuery or WildcardQuery.  So, if you 
search for "pipette*", send "pipette" to the analyzer, get "pipet" back, 
create the PrefixQuery using "pipet", not "pipette".

If y'all feel this is an issue that needs fixin', let me know and I'll post my 
fix.

Thanks,
David Birtwell



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message