Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 93026 invoked from network); 11 Aug 2008 17:43:40 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 11 Aug 2008 17:43:40 -0000 Received: (qmail 27535 invoked by uid 500); 11 Aug 2008 17:43:37 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 27525 invoked by uid 500); 11 Aug 2008 17:43:37 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 27514 invoked by uid 99); 11 Aug 2008 17:43:37 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Aug 2008 10:43:37 -0700 X-ASF-Spam-Status: No, hits=-1998.8 required=10.0 tests=ALL_TRUSTED,FS_REPLICA X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Aug 2008 17:42:50 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id F3764234C1A4 for ; Mon, 11 Aug 2008 10:42:46 -0700 (PDT) Message-ID: <922101774.1218476566995.JavaMail.jira@brutus> Date: Mon, 11 Aug 2008 10:42:46 -0700 (PDT) From: "dhruba borthakur (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Resolved: (HADOOP-1113) namenode slowdown when orphan block(s) left in neededReplication MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur resolved HADOOP-1113. -------------------------------------- Resolution: Cannot Reproduce Fix Version/s: 0.19.0 Thsi does not occur in later release of Hadoop. If we see this problem again, we can re-open this JIRA. I am closing this one as "invalid". > namenode slowdown when orphan block(s) left in neededReplication > ---------------------------------------------------------------- > > Key: HADOOP-1113 > URL: https://issues.apache.org/jira/browse/HADOOP-1113 > Project: Hadoop Core > Issue Type: Bug > Components: dfs > Affects Versions: 0.10.1 > Reporter: dhruba borthakur > Fix For: 0.19.0 > > > There were about 200 files that had some under-replicated blocks. A "dfs -setrep 4" followed by a "dfs -setrep 3" was done on these files. Most of the replications took place but the namenode CPU usage got stuck at 99%. The cluster has about 450 datanodes. > The stack trace of the namenode, we saw that there is always one thread of the following type: > IPC Server handler 3 on 8020" daemon prio=1 tid=0x0000002d941c7d30 nid=0x2d52 runnable [0x0000000042072000..0x0000000042072eb0] > at org.apache.hadoop.dfs.FSDirectory.getFileByBlock(FSDirectory.java:745) > - waiting to lock <0x0000002aa212f030> (a org.apache.hadoop.dfs.FSDirectory$INode) > at org.apache.hadoop.dfs.FSNamesystem.pendingTransfers(FSNamesystem.java:2155) > - locked <0x0000002aa210f6b8> (a java.util.TreeSet) > - locked <0x0000002aa21401a0> (a org.apache.hadoop.dfs.FSNamesystem) > at org.apache.hadoop.dfs.NameNode.sendHeartbeat(NameNode.java:521) > at sun.reflect.GeneratedMethodAccessor55.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:337) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:538) > Also, the namenode is currently not doing any replication requests (as seen from the namenode log). A new "setrep" command immediately took place. > My belief is that there is a block(s) that is permanently stuck in neededReplication. This causes all heartbeats requests to do lots of additional processing. thus leading to higher CPU usage. One possibility is that all datanodes that host the replicas of the block in neededReplication are down. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.