lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Timo Nentwig <>
Subject FuzzyQuery using termDocs() to reduce count of Boolean Queries
Date Wed, 07 Nov 2007 09:51:32 GMT

I asked this one already on the user mailing list but maybe it's more 
appropriate here:

As a simple example imagine every document in your index to have a 
field "language" and "country". A tuple of language+country is what I call a 

You want to search context-specific, i.e. language+country is always part of 
the query (QueryFilter).

FuzzyTermEnum doesn't know about these contexts hence building a BooleanQuery
of all similar terms. E.g. "hello" means "hallo" in german - only one 
character difference. But when searching in context english+USA I don't care 
about german terms. So I don't want/need "hallo" in the BooleanQuery in this 

So I came up with the idea to use reader.termDocs() instead of terms() in 
FuzzyTermEnum. By means of a QueryFilter (it's BitSet respectively) for each 
context I could determine whether a fuzzy term makes sense to be included in 
the BooleanQuery or not.

This results (potentially) in a smaller BooleanQuery but I wonder whether this 
approach will gain any mentionable performance advantage (maybe reduce IO?).

Thanks for feedback

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message