lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Keegan <peterlkee...@gmail.com>
Subject Re: sloppyFreq question
Date Fri, 20 Mar 2009 12:45:34 GMT
Sorry, here's the example I meant to show. Doc 1 and doc 2 both contain the
terms "hey look, the quick brown fox jumped very high", but in Doc 1 all the
terms are indexed at the same position. In doc 2, the terms are indexed in
adjacent positions (normal way). For the query "the quick brown fox", doc 1
will score higher than doc 2 because the sloppyFreq = 1/2 for doc 1 and 1/5
for doc 2. So, the term frequency factor for the score takes into account
both the number of matching terms and the distance between them.

This is fine for most span queries with more than 1 term (i guess), but  I'd
still suggest that a simple SpanTermQuery should behave more like TermQuery
w.r.t. sloppyFreq. Fortunately all of this is can be overridden in the
Similarity class. No real problem here - the discussions are quite helpful
though.


Thanks,
Peter

On Tue, Mar 17, 2009 at 7:18 PM, Chris Hostetter
<hossman_lucene@fucit.org>wrote:

>
> : > I suppose SpanTermQuery could override the weight/scorer methods so
> that
> : > it behaved more like a TermQuery if it was executed directly ... but
> : > that's really not what it's intended for.
> :
> : This is currently the only way to boost a term via payloads.
> : BoostingTermQuery extends SpanTermQuery.
>
> probably because it was the easiest way to get at the payload ... another
> reason to cahnge SpanTermQuery's weight i guess.
>
> : > if you're talking about a SpanNearQuery of "the quick brown fox" vs a
> : > SpanNearQuery of "brown fox" -- both against some doc like "hey look,
> the
> : > quick brown fox jumped very high" -- then sure, that doc might produce
> a
> : > lower score for the first query then it does for the second query ...
> but
> : > scores from differnet queries aren't comparable.
> :
> : Yes, this is the case I meant. To the casual observer, they both appear
> to
> : be "exact matches' with respect to term frequency. However, I realize
> that
> : the first query would score higher than the 2nd if all 4 terms were
> indexed
> : at the same position. I guess this is part of the point you're making
> about
> : spans. Would a plain PhraseQuery behave this way, too?
>
> i think you're missing my point -- it's not specific to spans: it doesn't
> mean *anything* to say "the first query would score higher then the 2nd"
> because scores aren't comparable between queries.  (unless you really go
> out of your way to make them comparable by customizing Similarity, and
> ensuring that they have hte exact same structure -- a SpanNearQuery
> containing 5 SpanTermQueries doesn't have the same structure as a
> SpanNearQuery containing two SpanTermQueries)
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message