lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Scoring on Number of Unique Terms Hit, Not Term Frequency Counts
Date Thu, 24 May 2007 18:24:31 GMT
Have a look at the DisjunctionMaxQuery, I think it might help,  
although I am not sure it will fully cover your case.


On May 24, 2007, at 11:22 AM, Walt Stoneburner wrote:

> Hi,
>  I'm trying to figure what I need to do with Lucene to score a
> document higher when it has a larger number of unique search terms
> that are hit, rather than term frequency counts.
>  A quick example.
>  If I'm searching for "BIRD CAT DOG" (all should clauses), then I want
>   ...a document with BIRD, CAT, and DOG terms, each only  appearing
> once, in it to score higher than
>   ...a document with BIRD, CAT, CAT, CAT, CAT, CAT, CAT, CAT.
>  The rationale behind this is that if something "fits" my query
> better by hitting more terms, I don't want it to be drowned out by a
> document that simply mentions a subset of keywords a lot of times.
>  And, the tricky part: ideally I'd like to be able to switch between
> the two schemes, so the user can get documents scored wither way.
>  So are I've been reading the 'score and frequency' thread at
>, where
> Niraj seems to have a similar problem.  He tries things overriding
> term frequencies, tf(), and setting the default similarity.
>  Unfortunately, it isn't long before the reply chain is 18 layers
> deep (I counted), and it never becomes clear if a solution was
> resolved, so I wasn't certain if I was on the right research path or
> not.  It started to appear that some of the scoring might be done at
> index time, but that didn't make sense to me, since weights and such
> can be done at query time.
>  Is there any way to have Lucene score based on the discrete number
> of unique terms found, rather than how often a given term appears in a
> document?
> Thanks,
> -wls
> ps.  When replying to this, it'd be great if not pertinent content to
> the reply were trimmed in the response.  I don't want to cause a
> similar message snowball to roll down the hill, picking up angle
> brackets along the way.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Grant Ingersoll
Center for Natural Language Processing

Read the Lucene Java FAQ at 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message