lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: Finding docs which contain at least x of the queryterms
Date Wed, 25 May 2005 16:29:46 GMT
On Wednesday 25 May 2005 13:00, Barbara Krausz wrote:
> 
> >
> Hi,
> 
> Consider a Query with e.g. 4 terms (t1,t2,t3,t4). I want to retrieve all 
> documents which contain at least e.g. 3 of the queryterms. How can I 
> implement this?
> The first idea is to use BooleanQueries such as
> (t1 and t2 and t3 and t4) or (t1 and t2 and t3) or(t1 and t2 and t4) or 
> (t1 and t3 and t4).....
> 
> But the perfomance is not very good when I have 20 queryterms and I want 
> to retrieve all docs which contain at least 15 of the terms.
> Can I modify the skipto-algorithm in ConjunctionScorer in order to 
> achieve this?

I don't think so, but in case you can describe a method to do this,
please share it.

In the svn trunk there is a DisjunctionSumScorer that has the
minimum number of subquery matchers as a constructor parameter:

http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/src/java/org/apache/lucene/search/

It has this javadoc comment in the advanceAfterCurrent method:
* @todo Investigate whether it is possible to use skipTo() when
* the minimum number of matchers is bigger than one, ie. try and use the
* character of ConjunctionScorer for the minimum number of matchers.

The constructor parameter is not used (even in the trunk), so you'll have
to write the code to use it yourself. I'd recommend to start from the trunk
and extend BooleanQuery for this.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message