lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: 2/3 of terms matched + coverage filter
Date Wed, 31 Oct 2007 16:35:55 GMT
On Wednesday 31 October 2007 14:51:12 Tobias Hill wrote:
> My documents all hava a field with variables number of terms
> (but rather few):
> Doc1.field = "foo bar gro"
> Doc2.field = "foo bar gro mot slu"
> Now I would like to search using the terms "foo bar gro"
>
> Problem 1:
> I like to express that at least any two of the three terms
> must match. Do I have to construct this clause myself - i.e.
> "(foo & bar) | (foo & gro) | (bar & gro)", or is there some
> clever way to do this?

BooleanQuery.setMinimumNumberShouldMatch(int) does this,
have a look at the javadocs for the details.

>
> Problem 2:
> I like to express that if the doc.field has too many terms
> that wasn't matched it should not be included at all in the
> result. In the example above Doc2 might have too many
> terms that was not matched to be included in the result.
> Is this kind of query possible, and how?
>
> The general case:
> I want to find those docs that has X% of the search terms
> matched and that the acctual match covers at least Y% of
> the available terms on the document.

This Y% is not directly possible, but I would expect the default
document score to correlate reasonably well with coverage.

In case you want an exact Y% cutoff, you'll run into the fact
that the field norm (the inverse square root of the field length)
is encoded in only 8 bits, which is rather course.

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message