lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Stewart <>
Subject Re: Replicating Large Indexes
Date Tue, 01 Nov 2011 15:11:38 GMT
Optimization merges index to a single segment (one huge file), so entire index will be copied
on replication.  So you really do need 2x disk in some cases then.

Do you really need to optimize?  We have a pretty big total index (about 200 million docs)
and we never optimize.  But we do have a sharded index so our largest indexes are only around
10 million docs.  We have merge factor of 2.  We run replication every minute. 

In our tests search performance was not very much better with optimization, but that may be
specific to our types of searches, etc.  You may have different results.


On Nov 1, 2011, at 12:46 AM, Jason Biggin wrote:

> Wondering if anyone has experience with replicating large indexes.  We have a Solr deployment
with 1 master, 1 master/slave and 5 slaves.  Our index contains 15+ million articles and is
~55GB in size.
> Performance is great on all systems.
> Debian Linux
> Apache-Tomcat
> 100GB disk
> 2 proc
> on VMWare ESXi 4.0
> We notice however that whenever the master is optimized, the complete index is replicated
to the slaves.  This causes a 100%+ bloat in disk requirements.
> Is this normal?  Is there a way around this?
> Currently our optimize is configured as such:
> 	curl 'http://localhost:8080/solr/update?optimize=true&maxSegments=1&waitFlush=true&expungeDeletes=true'
> Willing to share our experiences with Solr.
> Thanks,
> Jason

View raw message