hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Koji Noguchi (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-1955) Corrupted block replication retries for ever
Date Thu, 27 Sep 2007 18:18:50 GMT
Corrupted block replication retries for ever
--------------------------------------------

                 Key: HADOOP-1955
                 URL: https://issues.apache.org/jira/browse/HADOOP-1955
             Project: Hadoop
          Issue Type: Bug
          Components: dfs
    Affects Versions: 0.14.1
            Reporter: Koji Noguchi
            Priority: Blocker


When replicating corrupted block, receiving side rejects the block due to checksum error.
Namenode keeps on retrying (with the same source datanode).
Fsck shows those blocks as under-replicated.


[Namenode log]
{noformat} 
2007-09-27 02:00:05,273 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.heartbeatCheck:
lost heartbeat from 99.2.99.111
...
2007-09-27 02:01:02,618 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.pendingTransfer:
ask 99.9.99.11:9999 to replicate blk_-5925066143536023890 to datanode(s) 99.9.99.37:9999
2007-09-27 02:10:03,843 WARN org.apache.hadoop.fs.FSNamesystem: PendingReplicationMonitor
timed out block blk_-5925066143536023890
2007-09-27 02:10:08,248 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.pendingTransfer:
ask 99.9.99.11:9999 to replicate blk_-5925066143536023890 to datanode(s) 99.9.99.35:9999
2007-09-27 02:20:03,848 WARN org.apache.hadoop.fs.FSNamesystem: PendingReplicationMonitor
timed out block blk_-5925066143536023890
2007-09-27 02:20:08,646 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.pendingTransfer:
ask 99.9.99.11:9999 to replicate blk_-5925066143536023890 to datanode(s) 99.9.99.19:9999
(repeats)
{noformat} 

[Datanode(sender) 99.9.99.11 log]
{noformat} 
2007-09-27 02:01:04,493 INFO org.apache.hadoop.dfs.DataNode: Starting thread to transfer block
blk_-5925066143536023890 to [Lorg.apache.hadoop.dfs.DatanodeInfo;@e58187
2007-09-27 02:01:05,153 WARN org.apache.hadoop.dfs.DataNode: Failed to transfer blk_-5925066143536023890
to 74.6.128.37:50010 got java.net.SocketException: Connection reset
  at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96)
  at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
  at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
  at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
  at java.io.DataOutputStream.write(DataOutputStream.java:90)
  at org.apache.hadoop.dfs.DataNode.sendBlock(DataNode.java:1231)
  at org.apache.hadoop.dfs.DataNode$DataTransfer.run(DataNode.java:1280)
  at java.lang.Thread.run(Thread.java:619)
(repeats)
{noformat} 

[Datanode(one of the receiver) 99.9.99.37 log]
{noformat} 
2007-09-27 02:01:05,150 ERROR org.apache.hadoop.dfs.DataNode: DataXceiver: java.io.IOException:
Unexpected checksum mismatch while writing blk_-5925066143536023890 from /74.6.128.33:57605
  at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:902)
  at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:727)
  at java.lang.Thread.run(Thread.java:619)
{noformat} 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message