lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <yo...@apache.org>
Subject Re: Infrastructure for large Lucene index
Date Fri, 06 Oct 2006 22:10:31 GMT
On 10/6/06, Slava Imeshev <imeshev@yahoo.com> wrote:
> -- James <james@ryley.com> wrote:
> > > If the index is broken into multiple "shards" then we need multiple copies
> > of each shard, and some way of loadbalancing and failing over amongst copies
> > of shards.
> >
> > Yep.  Unfortunately it's not simple, but those are all pieces of what we are
> > currently in the process of implementing.
>
> The problem is that over time indexes develop "personality" and the term frequency
> can be vary significantly from index to index....

A global idf calculation is possible though... MultiSearcher already
does this when searching across multiple indicies.  The downside of
doing it across remote indicies is an increase in the number of RPC
calls.  In general, it's probably better to try and keep index shards
balanced.


-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

Mime
View raw message