lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: Whither Query Norm?
Date Sat, 21 Nov 2009 01:02:32 GMT
Go back and put it in after you have all the documents for that commit  
point. Or on reader load, calculate it.

- Mark

http://www.lucidimagination.com (mobile)

On Nov 20, 2009, at 7:56 PM, Jake Mannix <jake.mannix@gmail.com> wrote:

>
>
> On Fri, Nov 20, 2009 at 4:51 PM, Mark Miller <markrmiller@gmail.com>  
> wrote:
> Okay - my fault - I'm not really talking in terms of Lucene. Though  
> even
> there I consider it possible. You'd just have to like, rewrite it :)  
> And
> it would likely be pretty slow.
>
> Rewrite it how?  When you index the very first document, the docFreq  
> of all
> terms is 1, out of numDocs = 1 docs in the corpus.  Everybody's idf  
> is the same.
> No matter how you normalize this, it'll be wrong, once you've  
> indexed a million
> documents.  This isn't a matter of Lucene architecture, it's a  
> matter of idf being
> a query-time exactly available value (you can approximate it partway  
> through
> indexing, but you don't know it at all when you start).
>
>   -jake

Mime
View raw message