lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: Optimize SolrCloud without downtime
Date Wed, 25 Mar 2015 16:18:34 GMT
That's a high number of deleted documents as a percentage of your
index! Or at least I find those numbers surprising. When segments are
merged in the background during normal indexing, quite a bit of weight
is given to segments that have a high percentage of deleted docs. I
usually see at most 10-20% of docs deleted.

So what kinds of things have you done to get into this state? Did you
optimize previously? Change the merge policy? Anything else?


On Wed, Mar 25, 2015 at 8:08 AM, pavelhladik <> wrote:
> Hi,
> I didn't find the answer yet, please help. We have standalone Solr 5.0.0
> with a few cores yet. One of those cores contains:
> numDocs:120M
> deletedDocs:110M
> Our data are changing frequently so that's why so many deletedDocs.
> Optimized core takes around 50GB on disk, we are now almost on 100GB and I'm
> looking for best solution howto optimize this huge core without downtime. I
> know optimization working in background, but anyway when the optimization is
> running our search system is slow and sometimes I receive errors - this
> behavior is like a downtime for us.
> I would like to switch to SolrCloud, the performance is not a issue, so I
> don't need the sharding feature at this time. I'm more interested with
> replication and distribute requests by some Nginx proxy. Idea is:
> 1) proxy forward requests to node1 and optimize cores on node2
> 2) proxy forward requests to node2 and optimize cores on node1
> But when I do optimize on node2, the node1 is doing optimization as well,
> even if I use the "distrib=false" with curl.
> Can you please recommend architecture for optimizing without downtime? Many
> thanks.
> Pavel
> --
> View this message in context:
> Sent from the Solr - User mailing list archive at

View raw message