lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Solrcloud updating issue.
Date Thu, 29 Jun 2017 17:38:28 GMT
bq: we have also 5 zookeeper instances running on each node

If that's not a typo, it's bad practice. Do you mean "5 Solr instances"?

You should need no more than 3 ZK instances in this case. My guess is
that you're seeing timeouts but that the indexing is going on in the background.

Are you saying you have 16G physical memory or in the JVM? How much
physical memory do you have and how much memory is used by _all_ the
java processes running on the machine? You should have at least 50% of the
physical memory available for the op system, see:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Best,
Erick

On Thu, Jun 29, 2017 at 3:11 AM, Wudong Liu <wudong.liu@gmail.com> wrote:
> Hi All:
> We are trying to index a large number of documents in solrcloud and keep
> seeing the following error: org.apache.solr.common.SolrException: Service
> Unavailable, or org.apache.solr.common.SolrException: Service Unavailable
>
> but with a similar stack:
>
> request: http://wp-np2-c0:8983/solr/uniprot/update?wt=javabin&version=2
>         at
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:320)
>         at
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:185)
>         at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$22(ExecutorUtil.java:229)
>         at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$57/936653983.run(Unknown
> Source)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
>
>
> the settings are:
> 5 nodes in the cluster with each 16g memory, for the collection, it is
> defined with 5 shards, and replicate factor 2. the total number of
> documents is about 90m, each document size is quite large as well.
> we have also 5 zookeeper instances running on each node.
>
> On the solr side, we can see error like:
> solr.log.3-Error from server at
> http://wp-np2-c4.ebi.ac.uk:8983/solr/uniprot_shard5_replica1: Server Error
> solr.log.3-request:
> http://wp-np2-c4.ebi.ac.uk:8983/solr/uniprot_shard5_replica1/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2Fwp-np2-c0.ebi.ac.uk%3A8983%2Fsolr%2Funiprot_shard2_replica1%2F&wt=javabin&version=2
> solr.log.3-Remote error message: Async exception during distributed update:
> Connect to wp-np2-c2.ebi.ac.uk:8983 timed out
> solr.log.3-     at
> org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:948)
> solr.log.3-     at
> org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1679)
> solr.log.3-     at
> org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:182)
> --
> solr.log.3-     at
> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
> solr.log.3-     at
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
> solr.log.3-     at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
> solr.log.3-     at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
> solr.log.3-     at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
> solr.log.3-     at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
> solr.log.3-     at
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
> solr.log.3-     at java.lang.Thread.run(Thread.java:745)
>
>
> The strange bit is this exception doesn't seem to be captured by the
> try/catch block in our main thread. and the cluster seems in the good
> health (all nodes up) after the job done, we just missing lots of
> documents!
>
> any suggestion where we should look to resolve this problem?
>
> Best Regards,
> Wudong

Mime
View raw message