lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Timo Nentwig <tnent...@jamba.net>
Subject FuzzyQuery using termDocs() to reduce count of Boolean Queries
Date Wed, 07 Nov 2007 09:51:32 GMT
Hi!

I asked this one already on the user mailing list but maybe it's more 
appropriate here:

As a simple example imagine every document in your index to have a 
field "language" and "country". A tuple of language+country is what I call a 
context.

You want to search context-specific, i.e. language+country is always part of 
the query (QueryFilter).

FuzzyTermEnum doesn't know about these contexts hence building a BooleanQuery
of all similar terms. E.g. "hello" means "hallo" in german - only one 
character difference. But when searching in context english+USA I don't care 
about german terms. So I don't want/need "hallo" in the BooleanQuery in this 
case.

So I came up with the idea to use reader.termDocs() instead of terms() in 
FuzzyTermEnum. By means of a QueryFilter (it's BitSet respectively) for each 
context I could determine whether a fuzzy term makes sense to be included in 
the BooleanQuery or not.

This results (potentially) in a smaller BooleanQuery but I wonder whether this 
approach will gain any mentionable performance advantage (maybe reduce IO?).

Thanks for feedback
Timo

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message