lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.
Date Mon, 27 Oct 2014 12:40:15 GMT
OK, clarify a bit more what you're doing with Hadoop. Are you using
the MapReduceIndexerTool? Or are your Hadoop jobs writing directly to

How are you measuring "out of sync"? Are you sure that you've
committed? Does "out of synch" mean reporting different result counts?
Different order? Different numbers of deleted docs? Completely
different search results? How do you know? Do you measure with
&distrib=false to each one?

Details matter a lot here ;)

On Sun, Oct 26, 2014 at 9:59 PM, S.L <> wrote:
> Folks,
> I have posted previously about this , I am using SolrCloud 4.10.1 and have
> a sharded collection with  6 nodes , 3 shards and a replication factor of 2.
> I am indexing Solr using a Hadoop job , I have 15 Map fetch tasks , that
> can each have upto 5 threds each , so the load on the indexing side can get
> to as high as 75 concurrent threads.
> I am facing an issue where the replicas of a particular shard(s) are
> consistently getting out of synch , initially I thought this was beccause I
> was using a custom component , but I did a fresh install and removed the
> custom component and reindexed using the Hadoop job , I still see the same
> behavior.
> I do not see any exceptions in my catalina.out , like OOM , or any other
> excepitions, I suspecting thi scould be because of the multi-threaded
> indexing nature of the Hadoop job . I use CloudSolrServer from my java code
> to index and initialize the CloudSolrServer using a 3 node ZK ensemble.
> Does any one know of any known issues with a highly multi-threaded indexing
> and SolrCloud ?
> Can someone help ? This issue has been slowing things down on my end for a
> while now.
> Thanks and much appreciated!

View raw message