Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B2369109AF for ; Fri, 31 Jan 2014 15:23:31 +0000 (UTC) Received: (qmail 67587 invoked by uid 500); 31 Jan 2014 15:23:27 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 67213 invoked by uid 500); 31 Jan 2014 15:23:26 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 67204 invoked by uid 99); 31 Jan 2014 15:23:25 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 31 Jan 2014 15:23:25 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of markrmiller@gmail.com designates 209.85.216.173 as permitted sender) Received: from [209.85.216.173] (HELO mail-qc0-f173.google.com) (209.85.216.173) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 31 Jan 2014 15:23:19 +0000 Received: by mail-qc0-f173.google.com with SMTP id i8so7058906qcq.4 for ; Fri, 31 Jan 2014 07:22:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; bh=wRyH9yE4IAcj7RxnVxfHxl4eFYx/e/BPasxkgNLz80c=; b=F1cnxGQc1z60bEmALqn5EfxjQDJZc9yvb9KJ2tr27U1zbj22uVGTLiTsmaFgP9mzGO hhKizzPlYh0QaKMm5YjvLEV6XmD+0Sw2b2fqOj1HVYcJPYFz6JjOqpV0P47N48hyPT27 Ej3EmF8Lgf870YPikOetaN7/4tIoBBgh4jl5O7/uP1/616+p9xe3xGU8yO2KY/AFKlET wwZ51o0K3ClZAbV8UI5tUpQCRMrENkvmiwIjs8i0zxjcbdddvV0LQYkYqLTaQrLDTXU0 4Ia6j7l1tU94EbFHMDRr8e4Dad4Rgvh18DKefoylETHaqr+K06JrGk1NlfftpPbV1PGN Ncjw== X-Received: by 10.140.48.104 with SMTP id n95mr30618487qga.90.1391181778163; Fri, 31 Jan 2014 07:22:58 -0800 (PST) Received: from [192.168.1.15] (ool-457d6c9b.dyn.optonline.net. [69.125.108.155]) by mx.google.com with ESMTPSA id z1sm27884570qaz.18.2014.01.31.07.22.56 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 31 Jan 2014 07:22:56 -0800 (PST) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 7.1 \(1827\)) Subject: Re: shard1 gone missing ... From: Mark Miller In-Reply-To: <52EBBDA9.3080901@gmail.com> Date: Fri, 31 Jan 2014 10:22:55 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: <6143D0E3-E007-451C-B1CB-AAEB2A0118F9@gmail.com> References: <52EBBDA9.3080901@gmail.com> To: solr-user X-Mailer: Apple Mail (2.1827) X-Virus-Checked: Checked by ClamAV on apache.org Would probably need to see some logs to have an idea of what happened. Would also be nice to see the after state of zk in a text dump. You should be able to fix it, as long as you have the index on a disk, = just make sure it is where it is expected and manually update the = clusterstate.json. Would be good to take a look at the logs and see if = it tells anything first though. I=92d also highly recommend you try moving to Solr 4.6.1 when you can = though. We have fixed many, many, many bugs around SolrCloud in the 4 = releases since 4.4. You can follow the progress in the CHANGES file we = update for each release. I wrote a little about the 4.6.1 as it relates to SolrCloud here: = https://plus.google.com/+MarkMillerMan/posts/CigxUPN4hbA - Mark http://about.me/markrmiller On Jan 31, 2014, at 10:13 AM, David Santamauro = wrote: >=20 > Hi, >=20 > I have a strange situation. I created a collection with 4 ndoes = (separate servers, numShards=3D4), I then proceeded to index data ... = all has been seemingly well until this morning when I had to reboot one = of the nodes. >=20 > After reboot, the node I rebooted went into recovery mode! This is = completely illogical as there is 1 shard per node (no replicas). >=20 > What could have possibly happened to 1) trigger a recovery and; 2) = have the node think it has a replica to even recover from? >=20 > Looking at the graph from the SOLR admin page it shows that shard1 = disappeared and the server that was rebooted appears in a recovering = state under the server home to shard2. >=20 > I then looked at clusterstate.json and it confirms that shard1 is = completely missing and shard2 now has a replica. ... I'm baffled, = confused, dismayed. >=20 > Versions: > Solr 4.4 (4 nodes with tomcat container) > zookeeper-3.4.5 (5-node ensemble) >=20 > Oh, and I'm assuming shard1 is completely corrupt. >=20 > I'd really appreciate any insight. >=20 > David >=20 > PS I have a copy of all the shards backed up. Is there a way to = possibly rsync shard1 back into place and "fix" clusterstate.json = manually?