Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4BBEE10AD1 for ; Sat, 22 Mar 2014 19:24:14 +0000 (UTC) Received: (qmail 17345 invoked by uid 500); 22 Mar 2014 19:24:09 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 16905 invoked by uid 500); 22 Mar 2014 19:24:08 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 16893 invoked by uid 99); 22 Mar 2014 19:24:07 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 22 Mar 2014 19:24:07 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of static.void.dev@gmail.com designates 209.85.220.170 as permitted sender) Received: from [209.85.220.170] (HELO mail-vc0-f170.google.com) (209.85.220.170) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 22 Mar 2014 19:24:01 +0000 Received: by mail-vc0-f170.google.com with SMTP id hu19so4140539vcb.29 for ; Sat, 22 Mar 2014 12:23:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=UQbGw20we+7YpsvYz8/ZQl+pGUBqEjs+pT7Cc6TuIgM=; b=w446TzymriVnif//yTSS7HTwJg9R/nGOk3O0JaMfSwywo7/viIM6hA+bmPZ/uWQrH5 MtrqTnKWyU7PCn2vjSsIQGjJfC2aklOY7GRKCS2ccuiulr0JNWMIyCIOcGgpZ/teFOJs P2hjgkIEhIXZFwznjfBLdFzNLSKWhd5TFNGMp6GqR8v6ijj7J7cPkBjG+SRCSomo4gnf Gxg7d76icMWEhH58cPzARgNCKhT+u8Zuin7Xu9OhqDVGjZXx4/1nCAviWUsDqo1BQw5l GHHse+5JVyM+O6ZBuaQMe9UI74/zvYD6SZhdS5sf8Do6fwpZynTBzeRLa3mdyI5YpsgC tD1Q== MIME-Version: 1.0 X-Received: by 10.52.120.6 with SMTP id ky6mr27809vdb.38.1395516220226; Sat, 22 Mar 2014 12:23:40 -0700 (PDT) Received: by 10.58.95.228 with HTTP; Sat, 22 Mar 2014 12:23:40 -0700 (PDT) Date: Sat, 22 Mar 2014 12:23:40 -0700 Message-ID: Subject: Solr Cloud collection keep going down? From: Software Dev To: "solr-user@lucene.apache.org" Content-Type: multipart/alternative; boundary=089e013a0dbc95c10604f536f0bf X-Virus-Checked: Checked by ClamAV on apache.org --089e013a0dbc95c10604f536f0bf Content-Type: text/plain; charset=ISO-8859-1 We have 2 collections with 1 shard each replicated over 5 servers in the cluster. We see a lot of flapping (down or recovering) on one of the collections. When this happens the other collection hosted on the same machine is still marked as active. When this happens it takes a fairly long time (~30 minutes) for the collection to come back online, if at all. I find that its usually more reliable to completely shutdown solr on the affected machine and bring it back up with its core disabled. We then re-enable the core when its marked as active. A few questions: 1) What is the healthcheck in Solr-Cloud? Put another way, what is failing that marks one collection as down but the other on the same machine as up? 2) Why does recovery take forever when a node goes down.. even if its only down for 30 seconds. Our index is only 7-8G and we are running on SSD's. 3) What can be done to diagnose and fix this problem? --089e013a0dbc95c10604f536f0bf--