lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: Whither Query Norm?
Date Sat, 21 Nov 2009 00:56:29 GMT
On Fri, Nov 20, 2009 at 4:51 PM, Mark Miller <markrmiller@gmail.com> wrote:

> Okay - my fault - I'm not really talking in terms of Lucene. Though even
> there I consider it possible. You'd just have to like, rewrite it :) And
> it would likely be pretty slow.
>

Rewrite it how?  When you index the very first document, the docFreq of all
terms is 1, out of numDocs = 1 docs in the corpus.  Everybody's idf is the
same.
No matter how you normalize this, it'll be wrong, once you've indexed a
million
documents.  This isn't a matter of Lucene architecture, it's a matter of idf
being
a query-time exactly available value (you can approximate it partway through
indexing, but you don't know it at all when you start).

  -jake

Mime
View raw message