lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <>
Subject Re: ReplicationHandler - SnapPull failed to download a file completely.
Date Wed, 30 Oct 2013 20:00:27 GMT
On 10/30/2013 1:49 PM, Shalom Ben-Zvi Kazaz wrote:
> we are continuously getting this exception during replication from
> master to slave. our index size is 9.27 G and we are trying to replicate
> a slave from scratch.
> Its a different file each time , sometimes we get to 60% replication
> before it fails and sometimes only 10%, we never managed a successful
> replication.


> this is the master setup:
> |<requestHandler name="/replication" class="solr.ReplicationHandler" >
>     <lst name="master">
>       <str name="replicateAfter">commit</str>
>       <str name="replicateAfter">startup</str>
>       <str name="confFiles">stopwords.txt,spellings.txt,synonyms.txt,protwords.txt,elevate.xml,currency.xml</str>
>       <str name="commitReserveDuration">00:00:50</str>
>     </lst>
> </requestHandler>

I assume that you're probably doing commits fairly often, resulting in a 
lot of merge activity that frequently deletes segments.  That 
"commitReserveDuration" parameter needs to be made larger.  I would 
imagine that it takes a lot more than 50 seconds to do the replication - 
even if you've got an extremely fast network, replicating 9.7GB probably 
takes several minutes.

 From the wiki page on replication:  "If your commits are very frequent 
and network is particularly slow, you can tweak an extra attribute 
<str name="commitReserveDuration">00:00:10</str>. This is roughly the 
time taken to download 5MB from master to slave. Default is 10 secs."

You've said that your network is not slow, but with that much data, all 
networks are slow.


View raw message