lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Reitzel, Charles" <Charles.Reit...@tiaa-cref.org>
Subject RE: optimize status
Date Tue, 30 Jun 2015 00:44:17 GMT
I see what you mean.   Many thanks for the details.   

-----Original Message-----
From: Toke Eskildsen [mailto:te@statsbiblioteket.dk] 
Sent: Monday, June 29, 2015 6:36 PM
To: solr-user@lucene.apache.org
Subject: Re: optimize status

Reitzel, Charles <Charles.Reitzel@tiaa-cref.org> wrote:
> Question, Toke: in your "immutable" cases, don't the benefits of 
> optimizing come mostly from eliminating deleted records?

Not for us. We have about 1 deleted document for every 1000 or 10.000 standard documents.

> Is there any material difference in heap, CPU, etc. between 1, 5 or 10 segments?
> I.e. at how many segments/shard do you see a noticeable performance hit?

It really is either 1 or more than 1 segment, coupled with 0 deleted records or more than
0.

Having 1 segment means that String faceting benefits from not having to map between segment
ordinals and global ordinals. That's a speed increase (just a null check instead of a memory
lookup) as well as a heap requirement reduction: We save 2GB+ heap per shard on that account
(our current heap size is 8GB). Granted, we facet on 600M values for one of the fields, which
I don't think is very common.

0 deleted records is related as the usual bitmap of deleted documents is null, meaning faster
checks.

Most of the performance benefit probably comes from the freed memory. We have 25 shards/machine,
so sparing 2GB gives us an extra 50GB of disk cache. The performance increase for that is
20-40%, guesstimated from some previous tests where we varied the disk cache size.


I doubt that there is much difference between 2, 5, 10 or even 20 segments. The persons at
UKWA are running some tests on different degrees of optimization of their 30 shard TB-class
index. You'll have to dig a bit, but there might be relevant results: https://github.com/ukwa/shine/tree/master/python/test-logs

> Also, I curious if you have experimented much with the 
> maxMergedSegmentMB and reclaimDeletesWeight  properties of the TieredMergePolicy?

I have zero experience with that: We build the shards one at a time and don't touch them after
that. 90% of our building power goes to Tika analysis, so there hasn't been a apparent need
for tuning Solr's indexing.

- Toke Eskildsen

*************************************************************************
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and then delete
it.

TIAA-CREF
*************************************************************************


Mime
View raw message