Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 767EE200CB6 for ; Thu, 29 Jun 2017 19:39:16 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 75F31160BED; Thu, 29 Jun 2017 17:39:16 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 94F79160BC6 for ; Thu, 29 Jun 2017 19:39:15 +0200 (CEST) Received: (qmail 71045 invoked by uid 500); 29 Jun 2017 17:39:14 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 71033 invoked by uid 99); 29 Jun 2017 17:39:13 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Jun 2017 17:39:13 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 485E11806D5 for ; Thu, 29 Jun 2017 17:39:13 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.381 X-Spam-Level: X-Spam-Status: No, score=0.381 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, WEIRD_PORT=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id dr_K2YsoV6zG for ; Thu, 29 Jun 2017 17:39:11 +0000 (UTC) Received: from mail-lf0-f47.google.com (mail-lf0-f47.google.com [209.85.215.47]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 06AD45FDE5 for ; Thu, 29 Jun 2017 17:39:11 +0000 (UTC) Received: by mail-lf0-f47.google.com with SMTP id b207so57020616lfg.2 for ; Thu, 29 Jun 2017 10:39:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=2e1zlgu58bRCSCL1dg0PAg0HQzKKhyeimOsUhU8JmiM=; b=GKmaRet1aI6KMYzGX9cg1tMyVQmQRW+SIxll2CeNZb9R3DauQsh67ki1FEwV/lk25+ YUHb4apyT+FXteecGhyp+kPRuzgcaC0U9CXnybltOyZ+rX8WdS8SfEs674S+LI0vmTS0 ScZSu8mFXp3C+u5g9O61OKVxe+B6rcfp5JGtRi77GOM9gKCiEewOW18BEbjWvk+tYjtQ 6tUDCGJ0P7FBFcG7WKw+3G+MEW6cy/gLKrAh7858kmnIcBMGkIInXCZvUS511YIfeh+w 4BP0zxUnXx9jUOtWmizEUad8Joj1xxfYkthti7kFqOVh2cTRf1XtH6QNRbfm6s+THOKW ZDVA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=2e1zlgu58bRCSCL1dg0PAg0HQzKKhyeimOsUhU8JmiM=; b=AaD9THuWyUvtM16AYK15WSo/x1ecZmBoMldAD3kAsSMETjpP3V/wymUIxplWtisY/B yIeWL0l6IkRa9nKMRcFA7wmX9nXwGtV3kqtz5ahk+UJGUGt2vFZ/QLrItP0PcYPCKfAr iEnqEGh3b/Q18T6Pak548K+PdNuRS9gdUkouc1jG1qvG7c+0qmyFZaot6ZXvNCb/MAip H/yZ2zhYlyVrObSgnZRUN3xClS4O3YmEOZWbpj5hJ/3QPxFbyYqT+6KltyUtv0pFWKop na+vk+IANyyyOhM8JlauHBCtuTkuhXAlOWzeuF0say2+cgO1/C9cCqUaCJ2tjDksJfaO Ocog== X-Gm-Message-State: AKS2vOwjRZ/84u2+ZnXEImEvA4CCma7+stXgfrmvwKL2fHQSdxp40rdi 0PUifyimyTvfss1anQMF1+Lh2gbI0izPomg= X-Received: by 10.46.77.193 with SMTP id c62mr4956584ljd.72.1498757949382; Thu, 29 Jun 2017 10:39:09 -0700 (PDT) MIME-Version: 1.0 Received: by 10.25.16.161 with HTTP; Thu, 29 Jun 2017 10:38:28 -0700 (PDT) In-Reply-To: References: From: Erick Erickson Date: Thu, 29 Jun 2017 10:38:28 -0700 Message-ID: Subject: Re: Solrcloud updating issue. To: solr-user Content-Type: text/plain; charset="UTF-8" archived-at: Thu, 29 Jun 2017 17:39:16 -0000 bq: we have also 5 zookeeper instances running on each node If that's not a typo, it's bad practice. Do you mean "5 Solr instances"? You should need no more than 3 ZK instances in this case. My guess is that you're seeing timeouts but that the indexing is going on in the background. Are you saying you have 16G physical memory or in the JVM? How much physical memory do you have and how much memory is used by _all_ the java processes running on the machine? You should have at least 50% of the physical memory available for the op system, see: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html Best, Erick On Thu, Jun 29, 2017 at 3:11 AM, Wudong Liu wrote: > Hi All: > We are trying to index a large number of documents in solrcloud and keep > seeing the following error: org.apache.solr.common.SolrException: Service > Unavailable, or org.apache.solr.common.SolrException: Service Unavailable > > but with a similar stack: > > request: http://wp-np2-c0:8983/solr/uniprot/update?wt=javabin&version=2 > at > org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:320) > at > org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:185) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$22(ExecutorUtil.java:229) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$57/936653983.run(Unknown > Source) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > > the settings are: > 5 nodes in the cluster with each 16g memory, for the collection, it is > defined with 5 shards, and replicate factor 2. the total number of > documents is about 90m, each document size is quite large as well. > we have also 5 zookeeper instances running on each node. > > On the solr side, we can see error like: > solr.log.3-Error from server at > http://wp-np2-c4.ebi.ac.uk:8983/solr/uniprot_shard5_replica1: Server Error > solr.log.3-request: > http://wp-np2-c4.ebi.ac.uk:8983/solr/uniprot_shard5_replica1/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2Fwp-np2-c0.ebi.ac.uk%3A8983%2Fsolr%2Funiprot_shard2_replica1%2F&wt=javabin&version=2 > solr.log.3-Remote error message: Async exception during distributed update: > Connect to wp-np2-c2.ebi.ac.uk:8983 timed out > solr.log.3- at > org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:948) > solr.log.3- at > org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1679) > solr.log.3- at > org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:182) > -- > solr.log.3- at > org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) > solr.log.3- at > org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > solr.log.3- at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) > solr.log.3- at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) > solr.log.3- at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) > solr.log.3- at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) > solr.log.3- at > org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) > solr.log.3- at java.lang.Thread.run(Thread.java:745) > > > The strange bit is this exception doesn't seem to be captured by the > try/catch block in our main thread. and the cluster seems in the good > health (all nodes up) after the job done, we just missing lots of > documents! > > any suggestion where we should look to resolve this problem? > > Best Regards, > Wudong