Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0E0D4ED5D for ; Sat, 16 Mar 2013 00:05:54 +0000 (UTC) Received: (qmail 77682 invoked by uid 500); 16 Mar 2013 00:05:50 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 77638 invoked by uid 500); 16 Mar 2013 00:05:50 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 77629 invoked by uid 99); 16 Mar 2013 00:05:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 16 Mar 2013 00:05:50 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of markrmiller@gmail.com designates 209.85.220.181 as permitted sender) Received: from [209.85.220.181] (HELO mail-vc0-f181.google.com) (209.85.220.181) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 16 Mar 2013 00:05:44 +0000 Received: by mail-vc0-f181.google.com with SMTP id hv10so1640563vcb.26 for ; Fri, 15 Mar 2013 17:05:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer; bh=c3wLE2DuAkYqzpagnDRy69UfxE5f3G6m6TkEaFF8/yg=; b=MnNe+ItozGaCFcFTMCB4//eF5aASt59QmKrGZ7G9Cx25XX79xEHOREW3e3iUGVAN8b A3j9M16OGaPJMHjJ9f6QJrARHQibaSvzPXJBh40r5zQrkss5oHxfug25TcqY/fNkYViE kS/llqLOgY9v0XpsFl4/O+1aPNaSZv5oYyDJcv5hvDxQA6FpizzdfnS2kQpk+77XIpbZ FLtje6dWRs5E2lbpTydylq2Ub2isoix12j5heQPaXpSA2nNitwb0gGHKBOUqdq9s0YtK 7YM5IHUFQb+i7uVAhqt2VJ76qpPQpvcQLG6c0owX0fPFFLn8MlTmgZPSLfWX4hAlGm9/ uO8Q== X-Received: by 10.52.20.239 with SMTP id q15mr8347463vde.73.1363392323323; Fri, 15 Mar 2013 17:05:23 -0700 (PDT) Received: from [192.168.1.10] (ool-18bf2b7d.dyn.optonline.net. [24.191.43.125]) by mx.google.com with ESMTPS id l18sm6718173vdh.10.2013.03.15.17.05.21 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 15 Mar 2013 17:05:22 -0700 (PDT) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: overseer queue clogged From: Mark Miller In-Reply-To: Date: Fri, 15 Mar 2013 20:05:20 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: <0FDE1E95-0E15-4830-98D4-DFACBBC2DB46@gmail.com> References: <9F3D9860-4C7D-4E99-BFF2-1D1DE4FB75A3@gmail.com> To: solr-user@lucene.apache.org X-Mailer: Apple Mail (2.1499) X-Virus-Checked: Checked by ClamAV on apache.org Strange - we hardened that loop in 4.1 - so I'm not sure what happened = here. Can you do a stack dump on the overseer and see if you see an Overseer = thread running perhaps? Or just post the results? To recover, you should be able to just restart the Overseer node and = have someone else take over - they should pick up processing the queue. Any logs you might be able to share could be useful too. - Mark On Mar 15, 2013, at 7:51 PM, Gary Yngve wrote: > Also, looking at overseer_elect, everything looks fine. node is valid = and > live. >=20 >=20 > On Fri, Mar 15, 2013 at 4:47 PM, Gary Yngve = wrote: >=20 >> Sorry, should have specified. 4.1 >>=20 >>=20 >>=20 >>=20 >> On Fri, Mar 15, 2013 at 4:33 PM, Mark Miller = wrote: >>=20 >>> What Solr version? 4.0, 4.1 4.2? >>>=20 >>> - Mark >>>=20 >>> On Mar 15, 2013, at 7:19 PM, Gary Yngve = wrote: >>>=20 >>>> my solr cloud has been running fine for weeks, but about a week = ago, it >>>> stopped dequeueing from the overseer queue, and now there are = thousands >>> of >>>> tasks on the queue, most which look like >>>>=20 >>>> { >>>> "operation":"state", >>>> "numShards":null, >>>> "shard":"shard3", >>>> "roles":null, >>>> "state":"recovering", >>>> "core":"production_things_shard3_2", >>>> "collection":"production_things", >>>> "node_name":"10.31.41.59:8883_solr", >>>> "base_url":"http://10.31.41.59:8883/solr"} >>>>=20 >>>> i'm trying to create a new collection through collection API, and >>>> obviously, nothing is happening... >>>>=20 >>>> any suggestion on how to fix this? drop the queue in zk? >>>>=20 >>>> how could did it have gotten in this state in the first place? >>>>=20 >>>> thanks, >>>> gary >>>=20 >>>=20 >>=20