Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3065E18FCF for ; Fri, 2 Oct 2015 23:18:04 +0000 (UTC) Received: (qmail 11751 invoked by uid 500); 2 Oct 2015 23:17:59 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 11681 invoked by uid 500); 2 Oct 2015 23:17:59 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 11662 invoked by uid 99); 2 Oct 2015 23:17:59 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Oct 2015 23:17:59 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 05CDAC0429 for ; Fri, 2 Oct 2015 23:17:59 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.121 X-Spam-Level: X-Spam-Status: No, score=-0.121 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id BDkGbkqbWn58 for ; Fri, 2 Oct 2015 23:17:58 +0000 (UTC) Received: from mail-wi0-f169.google.com (mail-wi0-f169.google.com [209.85.212.169]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 923C242B2F for ; Fri, 2 Oct 2015 23:17:57 +0000 (UTC) Received: by wicge5 with SMTP id ge5so52267901wic.0 for ; Fri, 02 Oct 2015 16:17:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-type:content-transfer-encoding; bh=MoS4ia8lqyvR3WQHozQPgnRxVPBYAgkUmUQ8P10dr0s=; b=zPkshLDe7ZDubI0r8S5bOLZVVaG+p5nAISHEBb/LjmuWTg8ekj0M5ridiT8GOu5AgU rrzCHT1O9BdOHnYXRHyZw1Z7y0JaR3UzxWHgYwcSuhEwVbJ8rYwTiQ630h1hM3OvAjYV sJ7VQUA1UFjUta3bgxCP9HJzm7HYwg64b6Jm/PHemfwSjMmub96snpIQhX4eHTNWtuFC fNhGKMiODPAccDSVQAYIrNPwruZmx3biGZfK+lcsZnFsQ6w/F+s1KZNS6yaei73bC80L cnxikvG6VPYDMJORmofpX96BIKGljOh3JWVO5kc3vvww84++vKN/ww/BubxhGEz2gBh1 8ttg== X-Received: by 10.194.201.130 with SMTP id ka2mr18529890wjc.123.1443827876779; Fri, 02 Oct 2015 16:17:56 -0700 (PDT) Received: from Srikanths-MacBook-Pro.local ([38.122.182.51]) by smtp.googlemail.com with ESMTPSA id qc4sm13526002wjc.33.2015.10.02.16.17.55 for (version=TLSv1/SSLv3 cipher=OTHER); Fri, 02 Oct 2015 16:17:56 -0700 (PDT) Subject: Re: Recovery Thread Blocked To: solr-user@lucene.apache.org References: <560F0F63.1020907@gmail.com> From: Rallavagu Message-ID: <560F10A2.6070605@gmail.com> Date: Fri, 2 Oct 2015 16:17:54 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <560F0F63.1020907@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Here is the stack trace of the thread that is holding the lock. "Thread-55266" id=77142 idx=0xc18 tid=992 prio=5 alive, waiting, native_blocked, daemon -- Waiting for notification on: org/apache/solr/cloud/RecoveryStrategy@0x3f34e8480[fat lock] at pthread_cond_wait@@GLIBC_2.3.2+202(:0)@0x3d4180b5ba at eventTimedWaitNoTransitionImpl+71(event.c:90)@0x7ff3133b6ba8 at syncWaitForSignalNoTransition+65(synchronization.c:28)@0x7ff31354a0b2 at syncWaitForSignal+189(synchronization.c:85)@0x7ff31354a20e at syncWaitForJavaSignal+38(synchronization.c:93)@0x7ff31354a327 at RJNI_jrockit_vm_Threads_waitForNotifySignal+73(rnithreads.c:72)@0x7ff31351939a at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method) at java/lang/Object.wait(J)V(Native Method) at java/lang/Thread.join(Thread.java:1206) ^-- Lock released while waiting: org/apache/solr/cloud/RecoveryStrategy@0x3f34e8480[fat lock] at java/lang/Thread.join(Thread.java:1259) at org/apache/solr/update/DefaultSolrCoreState.cancelRecovery(DefaultSolrCoreState.java:331) ^-- Holding lock: java/lang/Object@0x114d8dd00[recursive] at org/apache/solr/update/DefaultSolrCoreState.doRecovery(DefaultSolrCoreState.java:297) ^-- Holding lock: java/lang/Object@0x114d8dd00[fat lock] at org/apache/solr/handler/admin/CoreAdminHandler$2.run(CoreAdminHandler.java:770) at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method) Stack trace of one of the 870 threads that is waiting for the lock to be released. "Thread-55489" id=77520 idx=0xebc tid=1494 prio=5 alive, blocked, native_blocked, daemon -- Blocked trying to get lock: java/lang/Object@0x114d8dd00[fat lock] at pthread_cond_wait@@GLIBC_2.3.2+202(:0)@0x3d4180b5ba at eventTimedWaitNoTransitionImpl+71(event.c:90)@0x7ff3133b6ba8 at syncWaitForSignalNoTransition+65(synchronization.c:28)@0x7ff31354a0b2 at syncWaitForSignal+189(synchronization.c:85)@0x7ff31354a20e at syncWaitForJavaSignal+38(synchronization.c:93)@0x7ff31354a327 at jrockit/vm/Threads.waitForUnblockSignal()V(Native Method) at jrockit/vm/Locks.fatLockBlockOrSpin(Locks.java:1411)[optimized] at jrockit/vm/Locks.lockFat(Locks.java:1512)[optimized] at jrockit/vm/Locks.monitorEnterSecondStageHard(Locks.java:1054)[optimized] at jrockit/vm/Locks.monitorEnterSecondStage(Locks.java:1005)[optimized] at jrockit/vm/Locks.monitorEnter(Locks.java:2179)[optimized] at org/apache/solr/update/DefaultSolrCoreState.doRecovery(DefaultSolrCoreState.java:290) at org/apache/solr/handler/admin/CoreAdminHandler$2.run(CoreAdminHandler.java:770) at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method) On 10/2/15 4:12 PM, Rallavagu wrote: > Solr 4.6.1 on Tomcat 7, single shard 4 node cloud with 3 node zookeeper > > During updates, some nodes are going very high cpu and becomes > unavailable. The thread dump shows the following thread is blocked 870 > threads which explains high CPU. Any clues on where to look? > > "Thread-56848" id=79207 idx=0x38 tid=3169 prio=5 alive, blocked, > native_blocked, daemon > -- Blocked trying to get lock: java/lang/Object@0x114d8dd00[fat lock] > at pthread_cond_wait@@GLIBC_2.3.2+202(:0)@0x3d4180b5ba > at eventTimedWaitNoTransitionImpl+71(event.c:90)@0x7ff3133b6ba8 > at > syncWaitForSignalNoTransition+65(synchronization.c:28)@0x7ff31354a0b2 > at syncWaitForSignal+189(synchronization.c:85)@0x7ff31354a20e > at syncWaitForJavaSignal+38(synchronization.c:93)@0x7ff31354a327 > at jrockit/vm/Threads.waitForUnblockSignal()V(Native Method) > at jrockit/vm/Locks.fatLockBlockOrSpin(Locks.java:1411)[optimized] > at jrockit/vm/Locks.lockFat(Locks.java:1512)[optimized] > at > jrockit/vm/Locks.monitorEnterSecondStageHard(Locks.java:1054)[optimized] > at > jrockit/vm/Locks.monitorEnterSecondStage(Locks.java:1005)[optimized] > at jrockit/vm/Locks.monitorEnter(Locks.java:2179)[optimized] > at > org/apache/solr/update/DefaultSolrCoreState.doRecovery(DefaultSolrCoreState.java:290) > > at > org/apache/solr/handler/admin/CoreAdminHandler$2.run(CoreAdminHandler.java:770) > > at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)