lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Klaas (JIRA)" <>
Subject [jira] Commented: (LUCENE-1534) idf(t) is not actually squared during scoring?
Date Tue, 03 Feb 2009 02:08:00 GMT


Mike Klaas commented on LUCENE-1534:

[quote]But if we feel that over-emphasizes terms with large idfs, then we should not remove
an idf factor from one vector, but rather rework our weight heuristic, perhaps replacing idf
with sqrt(idf), no?[/quote]

FWIW, having implemented web search on a large (500m) corpus, we found the stock idf factor
in lucene is too high, and ended up sqrt()'ing it in Similarity.

That said, much of the score in this system came from anchor text, link analysis scores, and
term proximity, so it's hard to measure the impact the idf change independently.

> idf(t) is not actually squared during scoring?
> ----------------------------------------------
>                 Key: LUCENE-1534
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.4
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
> The javadocs for Similarity:
> show idf(t) as being squared when computing net query score.  But I
> don't think it is actually squared, in looking at the sources?  Maybe
> it used to be, eg this interesting discussion:
> Or am I missing something?  We just need to fix the javadocs to take
> away the "squared"...

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message