Return-Path: Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: (qmail 83425 invoked from network); 15 Apr 2011 23:21:25 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 15 Apr 2011 23:21:25 -0000 Received: (qmail 56973 invoked by uid 500); 15 Apr 2011 22:54:44 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 56946 invoked by uid 500); 15 Apr 2011 22:54:44 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 56938 invoked by uid 99); 15 Apr 2011 22:54:44 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Apr 2011 22:54:44 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Apr 2011 22:54:43 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id E8AC6A3B01 for ; Fri, 15 Apr 2011 22:54:05 +0000 (UTC) Date: Fri, 15 Apr 2011 22:54:05 +0000 (UTC) From: "Jean-Daniel Cryans (JIRA)" To: issues@hbase.apache.org Message-ID: <194083030.61468.1302908045949.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Created] (HBASE-3789) Cleanup the locking contention in the master MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Cleanup the locking contention in the master -------------------------------------------- Key: HBASE-3789 URL: https://issues.apache.org/jira/browse/HBASE-3789 Project: HBase Issue Type: Improvement Affects Versions: 0.90.2 Reporter: Jean-Daniel Cryans Priority: Blocker Fix For: 0.92.0 The new master uses a lot of synchronized blocks to be safe, but it only takes a few jstacks to see that there's multiple layers of lock contention when a bunch of regions are moving (like when the balancer runs). The main culprits are regionInTransition in AssignmentManager, ZKAssign that uses ZKW.getZNnodes (basically another set of region in transitions), and locking at the RegionState level. My understanding is that even tho we have multiple threads to handle regions in transition, everything is actually serialized. Most of the time, lock holders are talking to ZK or a region server, which can take a few milliseconds. A simple example is when AssignmentManager wants to update the timers for all the regions on a RS, it will usually be waiting on another thread that's holding the lock while talking to ZK. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira