Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1126F176B6 for ; Wed, 6 May 2015 15:01:23 +0000 (UTC) Received: (qmail 94084 invoked by uid 500); 6 May 2015 15:01:19 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 94016 invoked by uid 500); 6 May 2015 15:01:19 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 94004 invoked by uid 99); 6 May 2015 15:01:19 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 May 2015 15:01:19 +0000 X-ASF-Spam-Status: No, hits=1.0 required=5.0 tests=SPF_SOFTFAIL X-Spam-Check-By: apache.org Received-SPF: softfail (athena.apache.org: transitioning domain of erickerickson@gmail.com does not designate 54.191.145.13 as permitted sender) Received: from [54.191.145.13] (HELO mx1-us-west.apache.org) (54.191.145.13) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 May 2015 15:01:14 +0000 Received: from mail-ie0-f173.google.com (mail-ie0-f173.google.com [209.85.223.173]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 1833F22F78 for ; Wed, 6 May 2015 15:00:54 +0000 (UTC) Received: by ieczm2 with SMTP id zm2so15820974iec.2 for ; Wed, 06 May 2015 08:00:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=CtM21rHrvrfUhBQfePTdbB98ndfaBH3iEynWiM3J/fY=; b=znzepk736NRkBpheUYbgT4iMbjkcRrqAIGhKm6SddW1aYcLrFuM7NG+ctoKty0DxCH QspEFwmHObEmurS+O224EAtjS6hn7Y1UoGUt81tK21R3uuEzAFgZxvHMZ83rxd47TB+c KWILmCCvWg/ch9ynKadOohdrNA9r3p4Iqx5PFjchUnoz9uHnJnoJqq9Pc3PCdVW9KV4T BaE6NUXB5WDioIon1NffvhL43l/Fr7LPWKo9kcWur7mEvxhRWvznLXljmWsVGivc/xdl bWVzErp7j8Zr4ykEfeItYC3MVbtJr0u6dfI7eS4U3GApvZqO0NFPZf68NbCa03tMcHoT TF1w== MIME-Version: 1.0 X-Received: by 10.107.18.170 with SMTP id 42mr523617ios.38.1430924066819; Wed, 06 May 2015 07:54:26 -0700 (PDT) Received: by 10.107.181.146 with HTTP; Wed, 6 May 2015 07:54:26 -0700 (PDT) In-Reply-To: References: Date: Wed, 6 May 2015 07:54:26 -0700 Message-ID: Subject: Re: Solr cloud clusterstate.json update query ? From: Erick Erickson To: solr-user@lucene.apache.org Content-Type: text/plain; charset=UTF-8 X-Virus-Checked: Checked by ClamAV on apache.org Gopal: Did you see my previous answer? Best, Erick On Tue, May 5, 2015 at 9:42 PM, Gopal Jee wrote: > about <2> , live_nodes under zookeeper is ephemeral node (please see > zookeeper ephemeral node). So, once connection from solr zkClient to > zookeeper is lost, these nodes will disappear automatically. AFAIK, > clusterstate.json is updated by overseer based on messages published to a > queue in zookeeper by solr zkclients. In case, solr node dies ungracefully, > I am not sure how this event is updated in clusterstate.json. > *Can someone shed some light *on ungraceful solr shutdown and consequent > status update in clusterstate. I guess there would be some ay, because all > nodes in a cluster decides clusterstate based on watched clusterstate.json > node. They will not be watching live_nodes for updating their state. > > Gopal > > On Wed, May 6, 2015 at 6:33 AM, Erick Erickson > wrote: > >> about <1>. This shouldn't be happening, so I wouldn't concentrate >> there first. The most common reason is that you have a short Zookeeper >> timeout and the replicas go into a stop-the-world garbage collection >> that exceeds the timeout. So the first thing to do is to see if that's >> happening. Here are a couple of good places to start: >> >> http://lucidworks.com/blog/garbage-collection-bootcamp-1-0/ >> http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning_for_Solr >> >> <2> Partial answer is that ZK does a keep-alive type thing and if the >> solr nodes it knows about don't reply, it marks the nodes as down. >> >> Best, >> Erick >> >> On Tue, May 5, 2015 at 5:42 AM, Sai Sreenivas K wrote: >> > Could you clarify on the following questions, >> > 1. Is there a way to avoid all the nodes simultaneously getting into >> > recovery state when a bulk indexing happens ? Is there an api to disable >> > replication on one node for a while ? >> > >> > 2. We recently changed the host name on nodes in solr.xml. But the old >> host >> > entries still exist in the clusterstate.json marked as active state. >> Though >> > live_nodes has the correct information. Who updates clusterstate.json if >> > the node goes down in an ungraceful fashion without notifying its down >> > state ? >> > >> > Thanks, >> > Sai Sreenivas K >> > > > > --