Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 33DD8110C5 for ; Mon, 24 Mar 2014 14:21:46 +0000 (UTC) Received: (qmail 16547 invoked by uid 500); 24 Mar 2014 14:21:41 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 15836 invoked by uid 500); 24 Mar 2014 14:21:39 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 15804 invoked by uid 99); 24 Mar 2014 14:21:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Mar 2014 14:21:36 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of lukasmikuckis@gmail.com designates 209.85.214.182 as permitted sender) Received: from [209.85.214.182] (HELO mail-ob0-f182.google.com) (209.85.214.182) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Mar 2014 14:21:31 +0000 Received: by mail-ob0-f182.google.com with SMTP id uz6so5752636obc.41 for ; Mon, 24 Mar 2014 07:21:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=qQy7ufZyzArMUsLb4bY8fKyMOh4QxU3vRrh8prwqNr4=; b=OGtgZk+u3AUH1VWwQ3LB/bFMTzTgTUELxD4yqsHQmtzRAhgqB5pT6w2UCUCM/8a685 gR1aKOnTf2stabOc4+NEz5VEPbGGDYxl5wn0GXjT1qBR3Mqcjq3Zm+JyT9B5B72vQQO1 osY80tTobVoM+ys41yiQdi/oJsKdn0+Epyh/LV7OEyLVc7RxC0WoQKdeoczzzQgOX+Jd Y+JOMG3lKi+p0cZnuL9DuSUwcgdVkO5CeIW0hxnA6P/YIsQ5dDHwfkdIatZCU1bv3kv7 K+thjSUALKSRu+SC5pV3WSEGwMvdNcfjC0vJOrkOjCtU5dn0bByfqncMi+FjEea2iJXm Zehw== MIME-Version: 1.0 X-Received: by 10.182.22.33 with SMTP id a1mr1994112obf.60.1395670871248; Mon, 24 Mar 2014 07:21:11 -0700 (PDT) Received: by 10.182.1.200 with HTTP; Mon, 24 Mar 2014 07:21:11 -0700 (PDT) In-Reply-To: References: Date: Mon, 24 Mar 2014 16:21:11 +0200 Message-ID: Subject: Re: SolrCloud from "Stopping recovery for" warnings to crash From: Lukas Mikuckis To: solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary=001a1133177c81155004f55af2f8 X-Virus-Checked: Checked by ClamAV on apache.org --001a1133177c81155004f55af2f8 Content-Type: text/plain; charset=UTF-8 Yes, we upgraded solr from 4.6.1 to 4.7 3 weeks ago (2 weeks before solr started crashing). When we were upgrading, we just upgraded solr and changed versions in collections configs. When solr crashes we get OOM but only 2h after first Stopping recovery warnings. Maybe you have any ideas when Stopping recovery warnings are thrown? Because now we have no idea what could cause this issue. Mon, 24 Mar 2014 04:03:17 GMT Shalin Shekhar Mangar : > > Did you upgrade recently to Solr 4.7? 4.7 has a bad bug which can > cause out of memory issues. Can you check your logs for out of memory > errors? > > On Sun, Mar 23, 2014 at 9:07 PM, Lukas Mikuckis wrote: > > Solr version: 4.7 > > > > Architecture: > > 2 solrs (1 shard, leader + replica) > > 3 zookeepers > > > > Servers: > > * zookeeper + solr (heap 4gb) - RAM 8gb, 2 cpu cores > > * zookeeper + solr (heap 4gb) - RAM 8gb, 2 cpu cores > > * zookeeper > > > > Solr data: > > * 21 collections > > * Many fields, small docs, docs count per collection from 1k to 500k > > > > About a week ago solr started crashing. It crashes every day, 3-4 times a > > day. Usually at nigh. I can't tell anything what could it be related to > > because at that time we haven't done any configuration changes. Load > > haven't changed too. > > > > > > Everything starts with Stopping recovery for .. warnings (every warnings is > > repeated several times): > > > > WARN org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for > > zkNodeName=core_node1core=****************** > > > > WARN org.apache.solr.cloud.ElectionContext; cancelElection did not find > > election node to remove > > > > WARN org.apache.solr.update.PeerSync; no frame of reference to tell if > > we've missed updates > > > > WARN - 2014-03-23 04:00:26.286; org.apache.solr.update.PeerSync; no frame > > of reference to tell if we've missed updates > > > > WARN - 2014-03-23 04:00:30.728; org.apache.solr.handler.SnapPuller; File > > _f9m_Lucene41_0.doc expected to be 6218278 while it is 7759879 > > > > WARN - 2014-03-23 04:00:54.126; > > org.apache.solr.update.UpdateLog$LogReplayer; Starting log replay > > tlog{file=/path/solr/collection1_shard1_replica2/data/tlog/tlog.0000000000000003272 > > refcount=2} active=true starting pos=356216606 > > > > Then again Stopping recovery for .. warnings: > > > > WARN org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for > > zkNodeName=core_node1core=****************** > > > > ERROR - 2014-03-23 05:19:29.566; org.apache.solr.common.SolrException; > > org.apache.solr.common.SolrException: No registered leader was found after > > waiting for 4000ms , collection: collection1 slice: shard1 > > > > ERROR - 2014-03-23 05:20:03.961; org.apache.solr.common.SolrException; > > org.apache.solr.common.SolrException: I was asked to wait on state down for > > IP:PORT_solr but I still do not see the requested state. I see state: > > active live:false > > > > > > After this serves mostly didn't recover. > > > > -- > Regards, > Shalin Shekhar Mangar. > > --001a1133177c81155004f55af2f8--