lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Finding docs which contain at least x of the queryterms
Date Wed, 25 May 2005 12:28:34 GMT

On May 25, 2005, at 7:00 AM, Barbara Krausz wrote:
> Hi,
>
> Consider a Query with e.g. 4 terms (t1,t2,t3,t4). I want to  
> retrieve all documents which contain at least e.g. 3 of the  
> queryterms. How can I implement this?
> The first idea is to use BooleanQueries such as
> (t1 and t2 and t3 and t4) or (t1 and t2 and t3) or(t1 and t2 and  
> t4) or (t1 and t3 and t4).....
>
> But the perfomance is not very good when I have 20 queryterms and I  
> want to retrieve all docs which contain at least 15 of the terms.
> Can I modify the skipto-algorithm in ConjunctionScorer in order to  
> achieve this?
>
> Thanks
> Barbara
>
> PS: Has anybody written a Statistics-class which says how many term  
> and different terms are  in  the index.  And perhaps computes the  
> mean length of the documents in the index with the standard deviation?

There is an interesting trick you can play with a custom Similarity  
class on a BooleanQuery - check out the coord method.  This could be  
used to ensure that an "overlap" of 3 is mandatory for a match, for  
example.

I'll leave the details of this as an exercise to the reader for the  
moment.

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message