lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ganesh" <>
Subject Re: Index size and performance degradation
Date Tue, 14 Jun 2011 07:42:27 GMT
We tried with more than 50 shards in the single system. Having multiple small index, indexes
and optimizes the content faster. We use ParallelMultiSearcher to search across the index
and the performance is really good. Now we plan to move to 64 Bit, so that we could use more


----- Original Message ----- 
From: "Shai Erera" <>
To: <>
Sent: Sunday, June 12, 2011 9:13 AM
Subject: Re: Index size and performance degradation

>I agree w/ Erick, there is no cutoff point (index size for that matter)
> above which you start sharding.
> What you can do is create a scheduled job in your system that runs a select
> list of queries and monitors their performance. Once it degrades, it shards
> the index by either splitting it (you can use IndexSplitter under contrib)
> or create a new shard, and direct new documents to it.
> I think I read somewhere, not sure if it was in Solr or ElasticSearch
> documentation, about a Balancer object, which moves shards around in order
> to balance the load on the cluster. You can implement something similar
> which tries to balance the index sizes, creates new shards on-the-fly, even
> merge shards if suddenly a whole source is being removed from the system
> etc.
> Also, note that the 'largest index size' threshold is really a machine
> constraint and not Lucene's. So if you decide that 10 GB is your cutoff, it
> is pointless to create 10x10GB shards on the same machine -- searching them
> is just like searching a 100GB index w/ 10x10GB segments. Perhaps it's even
> worse because you consume more RAM when the indexes are split (e.g., terms
> index, field infos etc.).
> Shai
> On Sun, Jun 12, 2011 at 3:10 AM, Erick Erickson <>wrote:
>> <<<We can't assume anything about the machine running it,
>> so testing won't really tell us much>>>
>> Hmmm, then it's pretty hopeless I think. Problem is that
>> anything you say about running on a machine with
>> 2G available memory on a single processor is completely
>> incomparable to running on a machine with 64G of
>> memory available for Lucene and 16 processors.
>> There's really no such thing as an "optimum" Lucene index
>> size, it always relates to the characteristics of the
>> underlying hardware.
>> I think the best you can do is actually test on various
>> configurations, then at least you can say "on configuration
>> X this is the tipping point".
>> Sorry there isn't a better answer that I know of, but...
>> Best
>> Erick
>> On Sat, Jun 11, 2011 at 3:37 PM, Itamar Syn-Hershko <>
>> wrote:
>> > Hi all,
>> >
>> > I know Lucene indexes to be at their optimum up to a certain size - said
>> to
>> > be around several GBs. I haven't found a good discussion over this, but
>> its
>> > my understanding that at some point its better to split an index into
>> parts
>> > (a la sharding) than to continue searching on a huge-size index. I assume
>> > this has to do with OS and IO configurations. Can anyone point me to more
>> > info on this?
>> >
>> > We have a product that is using Lucene for various searches, and at the
>> > moment each type of search is using its own Lucene index. We plan on
>> > refactoring the way it works and to combine all indexes into one - making
>> > the whole system more robust and with a smaller memory footprint, among
>> > other things.
>> >
>> > Assuming the above is true, we are interested in knowing how to do this
>> > correctly. Initially all our indexes will be run in one big index, but if
>> at
>> > some index size there is a severe performance degradation we would like
>> to
>> > handle that correctly by starting a new FSDirectory index to flush into,
>> or
>> > by re-indexing and moving large indexes into their own Lucene index.
>> >
>> > Are there are any guidelines for measuring or estimating this correctly?
>> > what we should be aware of while considering all that? We can't assume
>> > anything about the machine running it, so testing won't really tell us
>> > much...
>> >
>> > Thanks in advance for any input on this,
>> >
>> > Itamar.
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail:
>> > For additional commands, e-mail:
>> >
>> >
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message