Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8BC81111E3 for ; Tue, 8 Apr 2014 16:10:24 +0000 (UTC) Received: (qmail 98811 invoked by uid 500); 8 Apr 2014 16:10:19 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 98344 invoked by uid 500); 8 Apr 2014 16:10:18 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 98326 invoked by uid 99); 8 Apr 2014 16:10:15 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Apr 2014 16:10:15 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of solr@elyograg.org designates 166.70.79.219 as permitted sender) Received: from [166.70.79.219] (HELO frodo.elyograg.org) (166.70.79.219) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Apr 2014 16:10:10 +0000 Received: from localhost (localhost [127.0.0.1]) by frodo.elyograg.org (Postfix) with ESMTP id 2F5BF27FD for ; Tue, 8 Apr 2014 10:09:49 -0600 (MDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=elyograg.org; h= content-transfer-encoding:content-type:content-type:in-reply-to :references:subject:subject:mime-version:user-agent:from:from :date:date:message-id:received:received; s=mail; t=1396973389; bh=ogYG4EM5/+XblPxTPQmM8XhzUYfxKmFv91yOkzLeMe8=; b=HoRWX+VXAOdB w/yIAAE4oXQ49ZUzesP2azMLjKZd9keU3LEDVsPazepXBvELHBAgiifvzZV7mZC/ V4kDVxoq9QgymXkedz9rdyuvolEXqQhXq3mOtKtxgJpL5Xr8jJ+Z8ePoUswahFxy coGfiu2MzBYQO6Ak0GP2NiCxD2KesFo= X-Virus-Scanned: Debian amavisd-new at frodo.elyograg.org Received: from frodo.elyograg.org ([127.0.0.1]) by localhost (frodo.elyograg.org [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 41-7W3d6zj5c for ; Tue, 8 Apr 2014 10:09:49 -0600 (MDT) Received: from [10.2.0.182] (client175.mainstreamdata.com [209.63.42.175]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: elyograg@elyograg.org) by frodo.elyograg.org (Postfix) with ESMTPSA id A907F223C for ; Tue, 8 Apr 2014 10:09:48 -0600 (MDT) Message-ID: <53441F4A.5070602@elyograg.org> Date: Tue, 08 Apr 2014 10:09:46 -0600 From: Shawn Heisey User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: solr-user@lucene.apache.org Subject: Re: Cannot get shard id error - Hitting limits on creating collections References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org On 4/8/2014 9:48 AM, KNitin wrote: > I am running solr cloud 4.3.1 (there is a plan to upgrade to later > versions but that would take a few months). I noticed a very peculiar solr > behavior in solr that beyond *2496* cores I am unable to create any more > collections due to this error > > *Could not get shard id for core.....* > > I also noticed in the solr "tree" view that the overseer's collections work > queue gets stuck > ( /overseer > collection-queue-work > qn-0000000360 > qn-0000000362 > qn-0000000364) > > The test results are as follows. > > With 8 shards and 2 replicas, I can create 156 collections (and then hit > the above error) > With 4 shards and 2 replicas, I can create 312 collections (and then hit > the above error) > With 2 shards and 2 replicas, I can create 624 collections (and then hit > the above error) > > The total no of cores is 2496 in all the above cases. > > I am unable to create any more collections after this due to cannot get > shard id error? > > Is this a known bug or is there a work around for this? Is it fixed in > future releases? You're probably hitting configuration limits, which are set high enough for "typical" scalability requirements. Certain things need to be increased for extreme scalability. I don't know about all of them, so this is likely an incomplete list: One of them, most likely the one involved here, is the maximum size of the zookeeper database - the jute.maxbuffer system property, which defaults to one megabyte. Another is the maximum number of threads allowed by the servlet container. In Jetty, this is the maxThreads parameter. Another is the various connection and thread pool settings in the ShardHandler config. http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html#Unsafe+Options http://wiki.apache.org/solr/SolrConfigXml#Configuration_of_Shard_Handlers_for_Distributed_searches https://cwiki.apache.org/confluence/display/solr/Moving+to+the+New+solr.xml+Format As usual, I could be entirely incorrect about everything I'm saying. Thanks, Shawn