Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 32688 invoked from network); 10 Jul 2008 22:39:23 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 10 Jul 2008 22:39:23 -0000 Received: (qmail 39157 invoked by uid 500); 10 Jul 2008 22:39:22 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 39129 invoked by uid 500); 10 Jul 2008 22:39:22 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 39118 invoked by uid 99); 10 Jul 2008 22:39:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Jul 2008 15:39:22 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Jul 2008 22:38:39 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id B3F3B234C165 for ; Thu, 10 Jul 2008 15:38:31 -0700 (PDT) Message-ID: <1627003991.1215729511735.JavaMail.jira@brutus> Date: Thu, 10 Jul 2008 15:38:31 -0700 (PDT) From: "Raghu Angadi (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-3707) Frequent DiskOutOfSpaceException on almost-full datanodes In-Reply-To: <407995319.1215451351564.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612677#action_12612677 ] Raghu Angadi commented on HADOOP-3707: -------------------------------------- Advantages of the current patch : * fixes a real problem observed by users and increases robustness of DFS. * is certainly an improvement over what we have. * has no regressions. * does not slow down any NameNode activity. * in my opinion does not increase or decrease the complexity or change the nature of the big beast "FSNamesystem". * once it works well, the counter can be used for other scheduling activities. * I don't think "approx" in the name should distract much.. it is as accurate as it can be.. and we deal with small departures from accuracy in case of errors. It is only guilty of living with uncertainty :). Of course we can change the patch, for e.g. we can increase the "roll interval" from 5 minutes. > Frequent DiskOutOfSpaceException on almost-full datanodes > --------------------------------------------------------- > > Key: HADOOP-3707 > URL: https://issues.apache.org/jira/browse/HADOOP-3707 > Project: Hadoop Core > Issue Type: Bug > Components: dfs > Affects Versions: 0.17.0 > Reporter: Koji Noguchi > Assignee: Raghu Angadi > Priority: Blocker > Fix For: 0.17.2, 0.18.0, 0.19.0 > > Attachments: HADOOP-3707-branch-017.patch, HADOOP-3707-branch-017.patch, HADOOP-3707-trunk.patch, HADOOP-3707-trunk.patch, HADOOP-3707-trunk.patch > > > On a datanode which is completely full (leaving reserve space), we frequently see > target node reporting, > {noformat} > 2008-07-07 16:54:44,707 INFO org.apache.hadoop.dfs.DataNode: Receiving block blk_3328886742742952100 src: /11.1.11.111:22222 dest: /11.1.11.111:22222 > 2008-07-07 16:54:44,708 INFO org.apache.hadoop.dfs.DataNode: writeBlock blk_3328886742742952100 received exception org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Insufficient space for an additional block > 2008-07-07 16:54:44,708 ERROR org.apache.hadoop.dfs.DataNode: 33.3.33.33:22222:DataXceiver: org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Insufficient space for an additional block > at org.apache.hadoop.dfs.FSDataset$FSVolumeSet.getNextVolume(FSDataset.java:444) > at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:716) > at org.apache.hadoop.dfs.DataNode$BlockReceiver.(DataNode.java:2187) > at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1113) > at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:976) > at java.lang.Thread.run(Thread.java:619) > {noformat} > Sender reporting > {noformat} > 2008-07-07 16:54:44,712 INFO org.apache.hadoop.dfs.DataNode: 11.1.11.111:22222:Exception writing block blk_3328886742742952100 to mirror 33.3.33.33:22222 > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcher.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104) > at sun.nio.ch.IOUtil.write(IOUtil.java:75) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334) > at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:53) > at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:140) > at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:144) > at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:105) > at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109) > at java.io.DataOutputStream.write(DataOutputStream.java:90) > at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveChunk(DataNode.java:2292) > at org.apache.hadoop.dfs.DataNode$BlockReceiver.receivePacket(DataNode.java:2411) > at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:2476) > at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1204) > at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:976) > at java.lang.Thread.run(Thread.java:619) > {noformat} > Since it's not constantly happening, my guess is whenever datanode gets some small space available, namenode over-assigns blocks which can fail the block > pipeline. > (Note, before 0.17, namenode was much slower in assigning blocks) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.