lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: Finding docs which contain at least x of the queryterms
Date Wed, 25 May 2005 12:28:34 GMT

On May 25, 2005, at 7:00 AM, Barbara Krausz wrote:
> Hi,
> Consider a Query with e.g. 4 terms (t1,t2,t3,t4). I want to  
> retrieve all documents which contain at least e.g. 3 of the  
> queryterms. How can I implement this?
> The first idea is to use BooleanQueries such as
> (t1 and t2 and t3 and t4) or (t1 and t2 and t3) or(t1 and t2 and  
> t4) or (t1 and t3 and t4).....
> But the perfomance is not very good when I have 20 queryterms and I  
> want to retrieve all docs which contain at least 15 of the terms.
> Can I modify the skipto-algorithm in ConjunctionScorer in order to  
> achieve this?
> Thanks
> Barbara
> PS: Has anybody written a Statistics-class which says how many term  
> and different terms are  in  the index.  And perhaps computes the  
> mean length of the documents in the index with the standard deviation?

There is an interesting trick you can play with a custom Similarity  
class on a BooleanQuery - check out the coord method.  This could be  
used to ensure that an "overlap" of 3 is mandatory for a match, for  

I'll leave the details of this as an exercise to the reader for the  


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message