Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 25568 invoked from network); 14 Mar 2008 21:10:30 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 14 Mar 2008 21:10:30 -0000 Received: (qmail 42709 invoked by uid 500); 14 Mar 2008 21:10:26 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 42678 invoked by uid 500); 14 Mar 2008 21:10:26 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 42667 invoked by uid 99); 14 Mar 2008 21:10:26 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Mar 2008 14:10:26 -0700 X-ASF-Spam-Status: No, hits=-1998.8 required=10.0 tests=ALL_TRUSTED,FS_REPLICA X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Mar 2008 21:09:56 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 46A82234C09B for ; Fri, 14 Mar 2008 14:08:24 -0700 (PDT) Message-ID: <1741514507.1205528904288.JavaMail.jira@brutus> Date: Fri, 14 Mar 2008 14:08:24 -0700 (PDT) From: "dhruba borthakur (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-2606) Namenode unstable when replicating 500k blocks at once MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578929#action_12578929 ] dhruba borthakur commented on HADOOP-2606: ------------------------------------------ If the namenode always deterministically choses the same datanode as the source of a replication request and the source machine has a problem (bad disk, crc error, read-only partition, etc.etc) then the replication request will never be successful. It could also be the case that maybe there is a non-transient network failure between the source datanode and the target datanode. However, both the datanodes are successfully sending heartbeats to the namenode. No CRCs error occuring here. However, the replication request between these two datanodes will keep on failing permanently. Isn't it better if we can ensure that the namenodes tries different datanodes as the source of a replication request? > Namenode unstable when replicating 500k blocks at once > ------------------------------------------------------ > > Key: HADOOP-2606 > URL: https://issues.apache.org/jira/browse/HADOOP-2606 > Project: Hadoop Core > Issue Type: Bug > Components: dfs > Affects Versions: 0.14.3 > Reporter: Koji Noguchi > Assignee: Konstantin Shvachko > Fix For: 0.17.0 > > Attachments: ReplicatorNew.patch, ReplicatorTestOld.patch > > > We tried to decommission about 40 nodes at once, each containing 12k blocks. (about 500k total) > (This also happened when we first tried to decommission 2 million blocks) > Clients started experiencing "java.lang.RuntimeException: java.net.SocketTimeoutException: timed out waiting for rpc > response" and namenode was in 100% cpu state. > It was spending most of its time on one thread, > "org.apache.hadoop.dfs.FSNamesystem$ReplicationMonitor@7f401d28" daemon prio=10 tid=0x0000002e10702800 nid=0x6718 > runnable [0x0000000041a42000..0x0000000041a42a30] > java.lang.Thread.State: RUNNABLE > at org.apache.hadoop.dfs.FSNamesystem.containingNodeList(FSNamesystem.java:2766) > at org.apache.hadoop.dfs.FSNamesystem.pendingTransfers(FSNamesystem.java:2870) > - locked <0x0000002aa3cef720> (a org.apache.hadoop.dfs.UnderReplicatedBlocks) > - locked <0x0000002aa3c42e28> (a org.apache.hadoop.dfs.FSNamesystem) > at org.apache.hadoop.dfs.FSNamesystem.computeDatanodeWork(FSNamesystem.java:1928) > at org.apache.hadoop.dfs.FSNamesystem$ReplicationMonitor.run(FSNamesystem.java:1868) > at java.lang.Thread.run(Thread.java:619) > We confirmed that Namenode was not in the fullGC states when these problem happened. > Also, dfsadmin -metasave was showing "Blocks waiting for replication" was decreasing very slowly. > I believe this is not specific to decommission and same problem would happen if we lose one rack. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.