lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chuck Williams <>
Subject Re: Include BM25 in Lucene?
Date Tue, 17 Oct 2006 19:41:33 GMT
Vic Bancroft wrote on 10/17/2006 02:44 AM:
> In some of my group's usage of lucene over large document collections,
> we have split the documents across several machines.  This has lead to
> a concern of whether the inverse document frequency was appropriate,
> since the score seems to be dependant on the partioning of documents
> over indexing hosts.  We have not formulated an experiment to
> determine if it seriously effects our results, though it has been
> discussed.

What version of Lucene are you using?  Are you using
ParallelMultiSearcher to manage the distributed indexes or have you
implemented your own mechanism?  There was a bug a couple years ago, in
the 1.4.3 version as I recall, where ParallelMultiSearcher was not
computing df's appropriately, but that has been fixed for a long time
now.  The df's are the sum of the df's from each distributed index and
thus are independent of the partitioning.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message