lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Timo Nentwig <>
Subject Re: FuzzyQuery using termDocs() to reduce count of Boolean Queries
Date Wed, 07 Nov 2007 16:17:27 GMT
On Wednesday 07 November 2007 10:51:32 Timo Nentwig wrote:
> Hi!
> I asked this one already on the user mailing list but maybe it's more
> appropriate here:
> As a simple example imagine every document in your index to have a
> field "language" and "country". A tuple of language+country is what I call
> a context.
> You want to search context-specific, i.e. language+country is always part
> of the query (QueryFilter).
> FuzzyTermEnum doesn't know about these contexts hence building a
> BooleanQuery of all similar terms. E.g. "hello" means "hallo" in german -
> only one character difference. But when searching in context english+USA I
> don't care about german terms. So I don't want/need "hallo" in the
> BooleanQuery in this case.
> So I came up with the idea to use reader.termDocs() instead of terms() in
> FuzzyTermEnum. By means of a QueryFilter (it's BitSet respectively) for

Well...I didn't read to carefully, termDocs(Term) "returns an enumeration of 
all the documents which contain term". So for each terms() term I had to 
termDocs(). This will probably tear down performance more than this 
optimization will gain :-\

> each context I could determine whether a fuzzy term makes sense to be
> included in the BooleanQuery or not.
> This results (potentially) in a smaller BooleanQuery but I wonder whether
> this approach will gain any mentionable performance advantage (maybe reduce
> IO?).
> Thanks for feedback
> Timo
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message