Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id D057D200B6B for ; Thu, 25 Aug 2016 23:36:32 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id CBD2D160ABE; Thu, 25 Aug 2016 21:36:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 1C883160AA4 for ; Thu, 25 Aug 2016 23:36:21 +0200 (CEST) Received: (qmail 71483 invoked by uid 500); 25 Aug 2016 21:36:15 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 71471 invoked by uid 99); 25 Aug 2016 21:36:15 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Aug 2016 21:36:15 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id A15371A12DE for ; Thu, 25 Aug 2016 21:36:14 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 4.179 X-Spam-Level: **** X-Spam-Status: No, score=4.179 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, KAM_TIME=3, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id AiXYIsF07sUa for ; Thu, 25 Aug 2016 21:36:12 +0000 (UTC) Received: from mail-it0-f47.google.com (mail-it0-f47.google.com [209.85.214.47]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id B634C5F1B3 for ; Thu, 25 Aug 2016 21:36:11 +0000 (UTC) Received: by mail-it0-f47.google.com with SMTP id x131so297340354ite.0 for ; Thu, 25 Aug 2016 14:36:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=O/iaT+W70OPU9Kxa7wtXKOUZ57PCDmBVrIKo8kM2k4Y=; b=hm2igntZVojX/zQbdpvBgYSHlQa1lWcC5nPcw1Pqn45EWXWe4Xy9dtALBc6N1w95/4 IJiWP+b+AFtrqp9M2O0Jv6sEpbTgqjMij96o0xcbBGMGLLlfl+C17C25AfzBXgEk8oZF 3lVINd+Xtflx8cDR08CJUZHW4xfBsyNCh6LF5JFirjKkRFncjMNzT6LnSLEO/QGlDSb9 8lqrS432zFkn0UL7esV/tRlz1yFGKpxfu9WyC5Mxe6K3alrMZ2thHa4bht6qNv1nujEd 6fQDFIKPamS1dYp5HVGaEvJ3RYaJRnCCDnEQKB7Hpidm63iIt8bmVNrWFIscnGZZgyhq REKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=O/iaT+W70OPU9Kxa7wtXKOUZ57PCDmBVrIKo8kM2k4Y=; b=XtXzERVCrNqteHX90jbp+BCCP7n7gDMIQo3xxIooB+v/tJUC1WPnEPkEQQNNrWDqIa lwNky5j8M9RyR0FTyojDjJMEtwZcj5DCwz0JiiXJe84KV3W6pn2kipVgwlOZGFAuVuOn rKOuGM+iAJ0AEEQ3rgC7I7cVfD9PnP8OqvcMtsEfseASEThNZqEo05VqyOsvGD43VSYp IKlprxrO6NvKkzNEzuHhDCSEFlLi9jK7Y2YfRmSNWggtzTayrC+Ax2vNZ5uD+FeEJaSc vZDLM+X8IEj+NkHgj2d6GshCfL88+GhpxlaBMcLqiKTVgTv1X5E57tQMcUZZ56ZgZIge H32Q== X-Gm-Message-State: AEkoousXQgNblV5AoVKhCOX31DcPaYSs3MiytOIfSOQg3Z1mYb0F1Rf7qznpEfJUAz6qA/8LO2I631OgqlaWrA== X-Received: by 10.107.12.34 with SMTP id w34mr14360248ioi.171.1472160970444; Thu, 25 Aug 2016 14:36:10 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.150.19 with HTTP; Thu, 25 Aug 2016 14:35:29 -0700 (PDT) In-Reply-To: References: From: Erick Erickson Date: Thu, 25 Aug 2016 14:35:29 -0700 Message-ID: Subject: Re: solrcloud 6.0.1 any suggestions for fixing a replica that stubbornly remains down To: solr-user Content-Type: multipart/alternative; boundary=001a113f7e1cb2156d053aec2f0d archived-at: Thu, 25 Aug 2016 21:36:33 -0000 --001a113f7e1cb2156d053aec2f0d Content-Type: text/plain; charset=UTF-8 This is odd. The ADDREPLICA _should_ be immediately listed as "down", but should shortly go to "recovering"and then "active". The transition to "active" may take a while as the index has to be copied from the leader, but you shouldn't be stuck at "down" for very long. Take a look at the Solr logs for both the leader of the shard and the replica you're trying to add. They often have more complete and helpful error messages... Also note that you occasionally have to be patient. For instance, there's a 3 minute wait period for leader election at times. It sounds, though, like things aren't getting better for far longer than 3 minutes. Best, Erick On Thu, Aug 25, 2016 at 2:00 PM, Jon Hawkesworth < jon.hawkesworth@medquist.onmicrosoft.com> wrote: > Anyone got any suggestions how I can fix up my solrcloud 6.0.1 replica > remains down issue? > > > > Today we stopped all the loading and querying, brought down all 4 solr > nodes, went into zookeeper and deleted everything under /collections/ > transcribedReports/leader_initiated_recovery/shard1/ and brought the > cluster back up (this seeming to be a reasonably similar situation to > https://issues.apache.org/jira/browse/SOLR-7021 where this workaround is > described, albeit for an older version of solr. > > > > After a while things looked ok but when we attempted to move the second > replica back to the original node (by creating a third and then deleting > the temp one which wasn't on the node we wanted it on), we immediately got > a 'down' status on the node (and its stayed that way ever since), with ' Could > not publish as ACTIVE after succesful recovery ' messages appearing in > the logs > > > > Its as if there is something specifically wrong with that node that stops > us from ever having a functioning replica of shard1 on it. > > > > weird thing is shard2 on the same (problematic) node seems fine. > > > > Other stuff we have tried includes > > > > issuing a REQUESTRECOVERY > > moving from 2 to 4 nodes > > adding more replicas on other nodes (new replicas immediately go into down > state and stay that way). > > > > System is solrcloud 6.0.1 running on 4 nodes. There's 1 collection with 4 > shards and and I'm trying to have 2 replicas on each of the 4 nodes. > > Currently each shard is managing approx 1.2 million docs (mostly just text > 10-20k in size each usually). > > > > Any suggestions would be gratefully appreciated. > > > > Many thanks, > > > > Jon > > > > > > *Jon Hawkesworth* > Software Developer > > > > > > Hanley Road, Malvern, WR13 6NP. UK > > O: +44 (0) 1684 312313 > > *jon.hawkesworth@mmodal.com www.mmodal.com > * > > > > *This electronic mail transmission contains confidential information > intended only for the person(s) named. Any use, distribution, copying or > disclosure by another person is strictly prohibited. If you are not the > intended recipient of this e-mail, promptly delete it and all attachments.* > > > --001a113f7e1cb2156d053aec2f0d--