Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 25476 invoked from network); 11 Mar 2008 07:54:17 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 11 Mar 2008 07:54:17 -0000 Received: (qmail 51375 invoked by uid 500); 11 Mar 2008 07:54:14 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 50944 invoked by uid 500); 11 Mar 2008 07:54:13 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 50935 invoked by uid 99); 11 Mar 2008 07:54:13 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Mar 2008 00:54:13 -0700 X-ASF-Spam-Status: No, hits=-1998.8 required=10.0 tests=ALL_TRUSTED,FS_REPLICA X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Mar 2008 07:53:33 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 4B15A234C090 for ; Tue, 11 Mar 2008 00:52:46 -0700 (PDT) Message-ID: <959150855.1205221966306.JavaMail.jira@brutus> Date: Tue, 11 Mar 2008 00:52:46 -0700 (PDT) From: "dhruba borthakur (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-2976) Blocks staying underreplicated (for unclosed file) In-Reply-To: <859532889.1204936066261.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577325#action_12577325 ] dhruba borthakur commented on HADOOP-2976: ------------------------------------------ A block report calculates under-replicated and over-replicated values only if the block is not already in the blocksMap. In our case, the first datanode confirmed the block and the block is inserted into the blocksMap. When the next block report arrives from this datanode, the namenode notices that the blocksMap already contains this information. So, the namenode does not compute if this block is over-replicated or under-replicated. I guess it is expensive to compute under-replication and over-replication for each block in a block report. > Blocks staying underreplicated (for unclosed file) > -------------------------------------------------- > > Key: HADOOP-2976 > URL: https://issues.apache.org/jira/browse/HADOOP-2976 > Project: Hadoop Core > Issue Type: Bug > Components: dfs > Affects Versions: 0.15.3 > Reporter: Koji Noguchi > Assignee: dhruba borthakur > Priority: Minor > Fix For: 0.17.0 > > Attachments: leaseExpiryReplication.patch > > > We had two files staying underreplicated for over a day. > I checked that these under-replicated blocks are not corrupted. > (They were both task tmp files and most likely didn't get closed.) > Taking one file, /aaa/_task_200803040823_0001_r_000421_0/part-00421 > Namenode log showed > namenode.log.2008-03-04 2008-03-04 16:19:21,478 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: /aaa/_task_200803040823_0001_r_000421_0/part-00421. blk_-7848645760735416126 > 2008-03-04 16:19:24,357 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 11.1.111.111:22222 is added to blk_-7848645760735416126 > On the datanode 11.1.111.111, it showed > 2008-03-04 16:19:24,358 INFO org.apache.hadoop.dfs.DataNode: Received block blk_-7848645760735416126 from /55.55.55.55 and operation failed at /22.2.222.22 > On the second datanode 22.2.222.22, it showed > 2008-03-04 16:19:21,578 INFO org.apache.hadoop.dfs.DataNode: Exception writing to mirror 33.3.33.33 > java.net.SocketException: Connection reset > at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96) > at java.net.SocketOutputStream.write(SocketOutputStream.java:136) > at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109) > at java.io.DataOutputStream.write(DataOutputStream.java:90) > at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveChunk(DataNode.java:1333) > at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:1386) > at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:938) > at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:804) > at java.lang.Thread.run(Thread.java:619) > 2008-03-04 16:19:24,358 ERROR org.apache.hadoop.dfs.DataNode: DataXceiver: java.net.SocketException: Broken pipe > at java.net.SocketOutputStream.socketWrite0(Native Method) > at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) > at java.net.SocketOutputStream.write(SocketOutputStream.java:136) > at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at java.io.DataOutputStream.flush(DataOutputStream.java:106) > at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:1394) > at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:938) > at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:804) > at java.lang.Thread.run(Thread.java:619) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.