Return-Path: Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: (qmail 24712 invoked from network); 7 Sep 2010 05:15:16 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 7 Sep 2010 05:15:16 -0000 Received: (qmail 90581 invoked by uid 500); 7 Sep 2010 05:15:16 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 90387 invoked by uid 500); 7 Sep 2010 05:15:14 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 90373 invoked by uid 99); 7 Sep 2010 05:15:12 -0000 Received: from Unknown (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Sep 2010 05:15:12 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Sep 2010 05:14:55 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o875EXsZ011592 for ; Tue, 7 Sep 2010 05:14:33 GMT Message-ID: <17222617.48731283836473686.JavaMail.jira@thor> Date: Tue, 7 Sep 2010 01:14:33 -0400 (EDT) From: "Todd Lipcon (JIRA)" To: issues@hbase.apache.org Subject: [jira] Commented: (HBASE-2964) Deadlock when RS tries to RPC to itself inside SplitTransaction In-Reply-To: <27765381.46761283822253014.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906675#action_12906675 ] Todd Lipcon commented on HBASE-2964: ------------------------------------ As noted on the list, this seems to be due to HBASE-2461. Prior to 2461, when we split, we would close the region before doing any of the writes to META, and didn't hold any locks while doing the META updates. Now we keep the write lock all the way through, even after closing the region. I think simply moving the writeLock().unlock() up after the this.parent.close(false) in SplitTransaction should fix this issue. I'm testing that change on my test cluster now. > Deadlock when RS tries to RPC to itself inside SplitTransaction > --------------------------------------------------------------- > > Key: HBASE-2964 > URL: https://issues.apache.org/jira/browse/HBASE-2964 > Project: HBase > Issue Type: Bug > Components: ipc, regionserver > Affects Versions: 0.90.0 > Reporter: Todd Lipcon > Priority: Blocker > > In testing the 0.89.20100830 rc, I ran into a deadlock with the following situation: > - All of the IPC Handler threads are blocked on the region lock, which is held by CompactSplitThread. > - CompactSplitThread is in the process of trying to edit META to create the offline parent. META happens to be on the same server as is executing the split. > Therefore, the CompactSplitThread is trying to connect back to itself, but all of the handler threads are blocked, so the IPC never happens. Thus, the entire RS gets deadlocked. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.