hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1955) Corrupted block replication retries for ever
Date Wed, 03 Oct 2007 23:20:51 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532285
] 

Raghu Angadi commented on HADOOP-1955:
--------------------------------------

Also there is TestReplication.java, may be we should move the test there. Let me know. 
Currently it is under TestPendingReplication.java


> Corrupted block replication retries for ever
> --------------------------------------------
>
>                 Key: HADOOP-1955
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1955
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.1
>            Reporter: Koji Noguchi
>            Assignee: Raghu Angadi
>            Priority: Blocker
>             Fix For: 0.14.2
>
>         Attachments: HADOOP-1955-branch14.patch, HADOOP-1955-branch14.patch, HADOOP-1955.patch,
HADOOP-1955.patch, HADOOP-1955.patch
>
>
> When replicating corrupted block, receiving side rejects the block due to checksum error.
Namenode keeps on retrying (with the same source datanode).
> Fsck shows those blocks as under-replicated.
> [Namenode log]
> {noformat} 
> 2007-09-27 02:00:05,273 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.heartbeatCheck:
lost heartbeat from 99.2.99.111
> ...
> 2007-09-27 02:01:02,618 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.pendingTransfer:
ask 99.9.99.11:9999 to replicate blk_-5925066143536023890 to datanode(s) 99.9.99.37:9999
> 2007-09-27 02:10:03,843 WARN org.apache.hadoop.fs.FSNamesystem: PendingReplicationMonitor
timed out block blk_-5925066143536023890
> 2007-09-27 02:10:08,248 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.pendingTransfer:
ask 99.9.99.11:9999 to replicate blk_-5925066143536023890 to datanode(s) 99.9.99.35:9999
> 2007-09-27 02:20:03,848 WARN org.apache.hadoop.fs.FSNamesystem: PendingReplicationMonitor
timed out block blk_-5925066143536023890
> 2007-09-27 02:20:08,646 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.pendingTransfer:
ask 99.9.99.11:9999 to replicate blk_-5925066143536023890 to datanode(s) 99.9.99.19:9999
> (repeats)
> {noformat} 
> [Datanode(sender) 99.9.99.11 log]
> {noformat} 
> 2007-09-27 02:01:04,493 INFO org.apache.hadoop.dfs.DataNode: Starting thread to transfer
block blk_-5925066143536023890 to [Lorg.apache.hadoop.dfs.DatanodeInfo;@e58187
> 2007-09-27 02:01:05,153 WARN org.apache.hadoop.dfs.DataNode: Failed to transfer blk_-5925066143536023890
to 74.6.128.37:50010 got java.net.SocketException: Connection reset
>   at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96)
>   at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>   at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>   at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
>   at java.io.DataOutputStream.write(DataOutputStream.java:90)
>   at org.apache.hadoop.dfs.DataNode.sendBlock(DataNode.java:1231)
>   at org.apache.hadoop.dfs.DataNode$DataTransfer.run(DataNode.java:1280)
>   at java.lang.Thread.run(Thread.java:619)
> (repeats)
> {noformat} 
> [Datanode(one of the receiver) 99.9.99.37 log]
> {noformat} 
> 2007-09-27 02:01:05,150 ERROR org.apache.hadoop.dfs.DataNode: DataXceiver: java.io.IOException:
Unexpected checksum mismatch while writing blk_-5925066143536023890 from /74.6.128.33:57605
>   at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:902)
>   at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:727)
>   at java.lang.Thread.run(Thread.java:619)
> {noformat} 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message