lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <>
Subject Re: Scaling
Date Fri, 18 Jul 2008 16:51:48 GMT
>>I have no clue how large the impact could be 

I did do some benchmarking of a scoring scheme based on local idf vs one with visibility of
a global idf.
Using randomized allocation of documents to shards and sufficient volumes of content in each
index, the local idf policy produced identical top results to the global idf policy for the
vast majority of searches.


----- Original Message ----
From: Karl Wettin <>
Sent: Friday, 18 July, 2008 2:33:29 PM
Subject: Re: Scaling

18 jul 2008 kl. 09.49 skrev Eric Bowman:

> One thing I have trouble understanding is how scoring works in this  
> case.  Does Lucene really "just work", or are there special things  
> we have to do to make sure that the scores are coherent so we can  
> actually decide which was the best match?  What kind of constraints  
> are there when breaking up the index into parts to make sure scoring  
> remains coherent?

AFAIK the score would suffer from splitting up the index as tf/idf  
then only represent a part of the index, i.e. two identical docments  
in two indices would end up with different scores as the index meta  
data is different. I have no clue how large the impact could be nor if  
there are good and bad ways to split an index.

One solution I can think of is to share complete index over all nodes  
but restrict the results from each node to a subset of the index using  
a filter. This should produce the right score but will probably be a  
bit slower than splitting the index.

Perhaps it would be possible to split the index for searching but use  
an alternative source for scoring.


To unsubscribe, e-mail:
For additional commands, e-mail:

Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at Yahoo!

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message