Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9692B10775 for ; Sat, 7 Sep 2013 00:21:57 +0000 (UTC) Received: (qmail 87974 invoked by uid 500); 7 Sep 2013 00:21:53 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 87915 invoked by uid 500); 7 Sep 2013 00:21:53 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 87863 invoked by uid 99); 7 Sep 2013 00:21:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 07 Sep 2013 00:21:53 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of kevin.osborn@cbsinteractive.com designates 74.125.149.84 as permitted sender) Received: from [74.125.149.84] (HELO na3sys009aog135.obsmtp.com) (74.125.149.84) by apache.org (qpsmtpd/0.29) with SMTP; Sat, 07 Sep 2013 00:21:46 +0000 Received: from mail-ve0-f172.google.com ([209.85.128.172]) (using TLSv1) by na3sys009aob135.postini.com ([74.125.148.12]) with SMTP ID DSNKUipxg8N/XobsPvKcoU0Zobv1h2QK1SD+@postini.com; Fri, 06 Sep 2013 17:21:25 PDT Received: by mail-ve0-f172.google.com with SMTP id oz11so2153468veb.3 for ; Fri, 06 Sep 2013 17:21:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc:content-type; bh=0gmcGM4mIidn1IUWMBcqhZCYcXtl60y+J5wZj49sG0M=; b=LWhFnaRIhQuRA+kTV0CJXAHcDztWyvhdA7wzzg8bE+jM277NgyS1DV/8Zz7zcMaD21 CxrmJnaK+aSbOvke9h2a33q702gzHtn1EMqgIxkasQqvgNMbCQaUofydsBiC/l2kVLxT K76f5JjyZHSTVDX+9lDbSnizI4MYy2OD6q5uN+FsiAfHoCDfRm3floKzPcOD+TnRvzP3 LyyhVrIxTKJN9qYn3N9MOtepXGH9ofda1z8E6TksKd9rDUftwUPpu2EHUpFqPSbSBpJL RtwAhUQIWOnJG5iHel6Gor3Fajy9rVVq0A73+vxCpqd+R7QHNdYFjh+VfMjkEeh5S1UY ORcA== X-Gm-Message-State: ALoCoQk4M5KoqUjNGiOXYMFjzI9fc80fqI/pPhKtb/qYUOIC9/fAEOq30fHa2Gln+ibFV7q98najlEO3LQs6EVNAc+N6j7W9UT2nh4PZZDpBfdDZo2tncMVk7giFikZcov0EtkndUL45u3n7c5KAU0m0kjyNrQZFNGfR7yHwkSERRtuwFTPQaBv2Vwr1P2MTzc1nr9fSDjDD X-Received: by 10.52.227.6 with SMTP id rw6mr4088463vdc.19.1378513283891; Fri, 06 Sep 2013 17:21:23 -0700 (PDT) X-Received: by 10.52.227.6 with SMTP id rw6mr4088456vdc.19.1378513283767; Fri, 06 Sep 2013 17:21:23 -0700 (PDT) MIME-Version: 1.0 Sender: kevin.osborn@cbsinteractive.com Received: by 10.220.152.6 with HTTP; Fri, 6 Sep 2013 17:21:03 -0700 (PDT) In-Reply-To: References: From: Kevin Osborn Date: Fri, 6 Sep 2013 17:21:03 -0700 X-Google-Sender-Auth: _R0FOOBCgaxv99eNaKdZQCcjZpA Message-ID: Subject: Re: Solr Cloud hangs when replicating updates To: solr-user Cc: Mark Miller Content-Type: multipart/alternative; boundary=089e0116166098ed3904e5c022fb X-Virus-Checked: Checked by ClamAV on apache.org --089e0116166098ed3904e5c022fb Content-Type: text/plain; charset=ISO-8859-1 Thanks a ton Mark. I have tried SOLR-4816 and it didn't help. But I will try Mark's patch next week, and see what happens. -Kevin On Thu, Sep 5, 2013 at 4:46 AM, Erick Erickson wrote: > If you run into this again, try a jstack trace. You should see > evidence of being stuck in SolrCmdDistributor on a variable > called "semaphore"... On current 4x this is around line 420. > > If you're using SolrJ, then SOLR-4816 is another thing to try. > > But Mark's patch would be best of all to test, If that doesn't > fix it then the jstack suggestion would at least tell us if it's > the issue we think it is. > > FWIW, > Erick > > > On Wed, Sep 4, 2013 at 12:51 PM, Mark Miller > wrote: > > > It would be great if you could give this patch a try: > > http://pastebin.com/raw.php?i=aaRWwSGP > > > > - Mark > > > > > > On Wed, Sep 4, 2013 at 8:31 AM, Kevin Osborn > > wrote: > > > > > Thanks. If there is anything I can do to help you resolve this issue, > let > > > me know. > > > > > > -Kevin > > > > > > > > > On Wed, Sep 4, 2013 at 7:51 AM, Mark Miller > > wrote: > > > > > > > Ill look at fixing the root issue for 4.5. I've been putting it off > for > > > > way to long. > > > > > > > > Mark > > > > > > > > Sent from my iPhone > > > > > > > > On Sep 3, 2013, at 2:15 PM, Kevin Osborn > > wrote: > > > > > > > > > I was having problems updating SolrCloud with a large batch of > > records. > > > > The > > > > > records are coming in bursts with lulls between updates. > > > > > > > > > > At first, I just tried large updates of 100,000 records at a time. > > > > > Eventually, this caused Solr to hang. When hung, I can still query > > > Solr. > > > > > But I cannot do any deletes or other updates to the index. > > > > > > > > > > At first, my updates were going as SolrJ CSV posts. I have also > tried > > > > local > > > > > file updates and had similar results. I finally slowed things down > to > > > > just > > > > > use SolrJ's Update feature, which is basically just JavaBin. I am > > also > > > > > sending over just 100 at a time in 10 threads. Again, it eventually > > > hung. > > > > > > > > > > Sometimes, Solr hangs in the first couple of chunks. Other times, > it > > > > hangs > > > > > right away. > > > > > > > > > > These are my commit settings: > > > > > > > > > > > > > > > 15000 > > > > > 5000 > > > > > false > > > > > > > > > > > > > > > 30000 > > > > > > > > > > > > > > > I have tried quite a few variations with the same results. I also > > tried > > > > > various JVM settings with the same results. The only variable seems > > to > > > be > > > > > that reducing the cluster size from 2 to 1 is the only thing that > > > helps. > > > > > > > > > > I also did a jstack trace. I did not see any explicit deadlocks, > but > > I > > > > did > > > > > see quite a few threads in WAITING or TIMED_WAITING. It is > typically > > > > > something like this: > > > > > > > > > > java.lang.Thread.State: WAITING (parking) > > > > > at sun.misc.Unsafe.park(Native Method) > > > > > - parking to wait for <0x000000074039a450> (a > > > > > java.util.concurrent.Semaphore$NonfairSync) > > > > > at > > > > java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > > > > > at > > > > > > > > > > > > > > > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) > > > > > at > > > > > > > > > > > > > > > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) > > > > > at > > > > > > > > > > > > > > > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) > > > > > at > java.util.concurrent.Semaphore.acquire(Semaphore.java:317) > > > > > at > > > > > > > > > > > > > > > org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61) > > > > > at > > > > > > > > > > > > > > > org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418) > > > > > at > > > > > > > > > > > > > > > org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368) > > > > > at > > > > > > > > > > > > > > > org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300) > > > > > at > > > > > > > > > > > > > > > org.apache.solr.update.SolrCmdDistributor.distribAdd(SolrCmdDistributor.java:139) > > > > > at > > > > > > > > > > > > > > > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:474) > > > > > at > > > > > > > > > > > > > > > org.apache.solr.handler.loader.CSVLoaderBase.doAdd(CSVLoaderBase.java:395) > > > > > at > > > > > > > > > > > > > > > org.apache.solr.handler.loader.SingleThreadedCSVLoader.addDoc(CSVLoader.java:44) > > > > > at > > > > > > > > > org.apache.solr.handler.loader.CSVLoaderBase.load(CSVLoaderBase.java:364) > > > > > at > > > > org.apache.solr.handler.loader.CSVLoader.load(CSVLoader.java:31) > > > > > at > > > > > > > > > > > > > > > org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) > > > > > at > > > > > > > > > > > > > > > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) > > > > > at > > > > > > > > > > > > > > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) > > > > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) > > > > > at > > > > > > > > > > > > > > > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659) > > > > > at > > > > > > > > > > > > > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362) > > > > > at > > > > > > > > > > > > > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) > > > > > at > > > > > > > > > > > > > > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) > > > > > at > > > > > > > > > > > > > > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) > > > > > at > > > > > > > > > > > > > > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) > > > > > at > > > > > > > > > > > > > > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:533) > > > > > at > > > > > > > > > > > > > > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) > > > > > at > > > > > > > > > > > > > > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) > > > > > at > > > > > > > > > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) > > > > > at > > > > > > > > > > > > > > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) > > > > > at > > > > > > > > > > > > > > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) > > > > > at > > > > > > > > > > > > > > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) > > > > > at > > > > > > > > > > > > > > > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) > > > > > at > > > > > > > > > > > > > > > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) > > > > > > > > > > It basically appears that Solr gets stuck while trying to acquire a > > > > > semaphore that never becomes available. > > > > > > > > > > Anyone have any ideas? This is definitely causing major problems > for > > > us. > > > > > > > > > > -- > > > > > *KEVIN OSBORN* > > > > > LEAD SOFTWARE ENGINEER > > > > > CNET Content Solutions > > > > > OFFICE 949.399.8714 > > > > > CELL 949.310.4677 SKYPE osbornk > > > > > 5 Park Plaza, Suite 600, Irvine, CA 92614 > > > > > [image: CNET Content Solutions] > > > > > > > > > > > > > > > > -- > > > *KEVIN OSBORN* > > > LEAD SOFTWARE ENGINEER > > > CNET Content Solutions > > > OFFICE 949.399.8714 > > > CELL 949.310.4677 SKYPE osbornk > > > 5 Park Plaza, Suite 600, Irvine, CA 92614 > > > [image: CNET Content Solutions] > > > > > > > > > > > -- > > - Mark > > > -- *KEVIN OSBORN* LEAD SOFTWARE ENGINEER CNET Content Solutions OFFICE 949.399.8714 CELL 949.310.4677 SKYPE osbornk 5 Park Plaza, Suite 600, Irvine, CA 92614 [image: CNET Content Solutions] --089e0116166098ed3904e5c022fb--