hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hairong Kuang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3707) Frequent DiskOutOfSpaceException on almost-full datanodes
Date Mon, 07 Jul 2008 17:40:31 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611252#action_12611252
] 

Hairong Kuang commented on HADOOP-3707:
---------------------------------------

I believe that this is triggered by much faster block replication scheduling introduced by
HADOOP-2606. Currently when the namenode choosing a replication target, it only looks at the
remaining disk space provided by a datanode through the previous heartbeat but do not look
at the space that will be taken by the blocks that have been scheduled to replicate to this
datanode. If the previous heartbeat says it has space for 5 blocks but the namenode can schedule
as many as 50 blocks to this datanode, the overassigned 45 blocks will fail with the DiskOutOfSpaceException.
It looks that the namenode should look at the scheduled blocks as well when choosing a replication
target.

> Frequent DiskOutOfSpaceException on almost-full datanodes
> ---------------------------------------------------------
>
>                 Key: HADOOP-3707
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3707
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>
> On a datanode which is completely full (leaving reserve space),  we frequently see
> target node reporting, 
> {noformat}
> 2008-07-07 16:54:44,707 INFO org.apache.hadoop.dfs.DataNode: Receiving block blk_3328886742742952100
src: /11.1.11.111:22222 dest: /11.1.11.111:22222
> 2008-07-07 16:54:44,708 INFO org.apache.hadoop.dfs.DataNode: writeBlock blk_3328886742742952100
received exception org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Insufficient
space for an additional block
> 2008-07-07 16:54:44,708 ERROR org.apache.hadoop.dfs.DataNode: 33.3.33.33:22222:DataXceiver:
org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Insufficient space for an additional
block
>         at org.apache.hadoop.dfs.FSDataset$FSVolumeSet.getNextVolume(FSDataset.java:444)
>         at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:716)
>         at org.apache.hadoop.dfs.DataNode$BlockReceiver.<init>(DataNode.java:2187)
>         at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1113)
>         at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:976)
>         at java.lang.Thread.run(Thread.java:619)
> {noformat}
> Sender reporting 
> {noformat}
> 2008-07-07 16:54:44,712 INFO org.apache.hadoop.dfs.DataNode: 11.1.11.111:22222:Exception
writing block blk_3328886742742952100 to mirror 33.3.33.33:22222
> java.io.IOException: Broken pipe
>         at sun.nio.ch.FileDispatcher.write0(Native Method)
>         at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
>         at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104)
>         at sun.nio.ch.IOUtil.write(IOUtil.java:75)
>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
>         at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:53)
>         at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:140)
>         at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:144)
>         at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:105)
>         at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>         at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
>         at java.io.DataOutputStream.write(DataOutputStream.java:90)
>         at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveChunk(DataNode.java:2292)
>         at org.apache.hadoop.dfs.DataNode$BlockReceiver.receivePacket(DataNode.java:2411)
>         at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:2476)
>         at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1204)
>         at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:976)
>         at java.lang.Thread.run(Thread.java:619)
> {noformat}
> Since it's not constantly happening,  my guess is whenever datanode gets some small space
available, namenode over-assigns blocks which can fail the block
> pipeline.
> (Note, before 0.17, namenode was much slower in assigning blocks)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message