Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 44539 invoked from network); 5 Jun 2007 06:28:52 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 5 Jun 2007 06:28:52 -0000 Received: (qmail 84576 invoked by uid 500); 5 Jun 2007 06:28:52 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 84545 invoked by uid 500); 5 Jun 2007 06:28:52 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 84496 invoked by uid 99); 5 Jun 2007 06:28:52 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Jun 2007 23:28:52 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Jun 2007 23:28:47 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id AA0464297CF for ; Mon, 4 Jun 2007 23:28:26 -0700 (PDT) Message-ID: <8418945.1181024906691.JavaMail.jira@brutus> Date: Mon, 4 Jun 2007 23:28:26 -0700 (PDT) From: "dhruba borthakur (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Updated: (HADOOP-1269) DFS Scalability: namenode throughput impacted becuase of global FSNamesystem lock In-Reply-To: <13878106.1176926535288.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HADOOP-1269: ------------------------------------- Attachment: chooseTargetLock2.patch Incorporated Konstantin's review comments. 1. NetworkTopology.isOnSameRack looks at node.getParent(). These are protected by the clusterMap lock. So, I kept it as it way, did not make any change. 2. NetworkTopology.getDistance(): removed redundant declaration i. 3. Host2NodesMap.add locking issue. This was a good catch. I made this change. Fixed indentation. 4. Moved the LOG statement in getAdditionalBlock as suggested. I also ran randomWriter on a 10 node cluster. The test ran to completion. The total elapsed time of the test without this patch was 2 hr 40 min and with this patch was 2 hours 31 minutes. Not a single task error was encountered. > DFS Scalability: namenode throughput impacted becuase of global FSNamesystem lock > --------------------------------------------------------------------------------- > > Key: HADOOP-1269 > URL: https://issues.apache.org/jira/browse/HADOOP-1269 > Project: Hadoop > Issue Type: Bug > Components: dfs > Reporter: dhruba borthakur > Assignee: dhruba borthakur > Attachments: chooseTargetLock.patch, chooseTargetLock2.patch, serverThreads1.html, serverThreads40.html > > > I have been running a 2000 node cluster and measuring namenode performance. There are quite a few "Calls dropped" messages in the namenode log. The namenode machine has 4 CPUs and each CPU is about 30% busy. Profiling the namenode shows that the methods the consume CPU the most are addStoredBlock() and getAdditionalBlock(). The first method in invoked when a datanode confirms the presence of a newly created block. The second method in invoked when a DFSClient request a new block for a file. > I am attaching two files that were generated by the profiler. serverThreads40.html captures the scenario when the namenode had 40 server handler threads. serverThreads1.html is with 1 server handler thread (with a max_queue_size of 4000). > In the case when there are 40 handler threads, the total elapsed time taken by FSNamesystem.getAdditionalBlock() is 1957 seconds whereas the methods that that it invokes (chooseTarget) takes only about 97 seconds. FSNamesystem.getAdditionalBlock is blocked on the global FSNamesystem lock for all those 1860 seconds. > My proposal is to implement a finer grain locking model in the namenode. The FSNamesystem has a few important data structures, e.g. blocksMap, datanodeMap, leases, neededReplication, pendingCreates, heartbeats, etc. Many of these data structures already have their own lock. My proposal is to have a lock for each one of these data structures. The individual lock will protect the integrity of the contents of the data structure that it protects. The global FSNamesystem lock is still needed to maintain consistency across different data structures. > If we implement the above proposal, both addStoredBlock() and getAdditionalBlock() does not need to hold the global FSNamesystem lock. startFile() and closeFile() still needs to acquire the global FSNamesystem lock because it needs to ensure consistency across multiple data structures. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.