lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Scoring on Number of Unique Terms Hit, Not Term Frequency Counts
Date Thu, 24 May 2007 18:24:31 GMT
Have a look at the DisjunctionMaxQuery, I think it might help,  
although I am not sure it will fully cover your case.

-Grant

On May 24, 2007, at 11:22 AM, Walt Stoneburner wrote:

> Hi,
>
>  I'm trying to figure what I need to do with Lucene to score a
> document higher when it has a larger number of unique search terms
> that are hit, rather than term frequency counts.
>
>  A quick example.
>
>  If I'm searching for "BIRD CAT DOG" (all should clauses), then I want
>
>   ...a document with BIRD, CAT, and DOG terms, each only  appearing
> once, in it to score higher than
>
>   ...a document with BIRD, CAT, CAT, CAT, CAT, CAT, CAT, CAT.
>
>  The rationale behind this is that if something "fits" my query
> better by hitting more terms, I don't want it to be drowned out by a
> document that simply mentions a subset of keywords a lot of times.
>
>  And, the tricky part: ideally I'd like to be able to switch between
> the two schemes, so the user can get documents scored wither way.
>
>
>  So are I've been reading the 'score and frequency' thread at
> http://www.gossamer-threads.com/lists/lucene/java-user/8916, where
> Niraj seems to have a similar problem.  He tries things overriding
> term frequencies, tf(), and setting the default similarity.
>
>  Unfortunately, it isn't long before the reply chain is 18 layers
> deep (I counted), and it never becomes clear if a solution was
> resolved, so I wasn't certain if I was on the right research path or
> not.  It started to appear that some of the scoring might be done at
> index time, but that didn't make sense to me, since weights and such
> can be done at query time.
>
>  Is there any way to have Lucene score based on the discrete number
> of unique terms found, rather than how often a given term appears in a
> document?
>
> Thanks,
> -wls
> ps.  When replying to this, it'd be great if not pertinent content to
> the reply were trimmed in the response.  I don't want to cause a
> similar message snowball to roll down the hill, picking up angle
> brackets along the way.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org/tech/lucene.asp

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ 
LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message