hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Koji Noguchi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1595) DFSClient may incorrectly detect datanode failure
Date Wed, 26 Jan 2011 00:26:45 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986788#action_12986788

Koji Noguchi commented on HDFS-1595:

bq. So this faulty node F has no problem receiving large amount of data? 

This faulty node had problem sending/receiving large amount of data and failing most of the
Bigger the data, higher the chances of the failures.  I think smaller data (,say less than
1MB) was going through 99% of the time.
So heartbeat, ack and so forth were probably working.

When I tried to scp some blocks out from this node for data recovery, it kept on failing with

blk_-113193561174013799                                                    0%    0     0.0KB/s
  --:-- ETA
Corrupted MAC on input.
Finished discarding for aa.bb.cc.dd
lost connection


So I believe *most* of the dfsclient write was failing when going through this node.
And when it successfully went through (after hundreds of write attempts for different blocks),
it would then fail on all the following replications but succeed on 'close' with 1 replica
leading to this bug.

> DFSClient may incorrectly detect datanode failure
> -------------------------------------------------
>                 Key: HDFS-1595
>                 URL: https://issues.apache.org/jira/browse/HDFS-1595
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node, hdfs client
>    Affects Versions: 0.20.4
>            Reporter: Tsz Wo (Nicholas), SZE
>            Priority: Critical
>         Attachments: hdfs-1595-idea.txt
> Suppose a source datanode S is writing to a destination datanode D in a write pipeline.
 We have an implicit assumption that _if S catches an exception when it is writing to D, then
D is faulty and S is fine._  As a result, DFSClient will take out D from the pipeline, reconstruct
the write pipeline with the remaining datanodes and then continue writing .
> However, we find a case that the faulty machine F is indeed S but not D.  In the case
we found, F has a faulty network interface (or a faulty switch port) in such a way that the
faulty network interface works fine when sending out a small amount of data, say 1MB, but
it fails when sending out a large amount of data, say 100MB.  Reading is working fine for
any data size.
> It is even worst if F is the first datanode in the pipeline.  Consider the following:
> # DFSClient creates a pipeline with three datanodes.  The first datanode is F.
> # F catches an IOException when writing to the second datanode. Then, F reports the second
datanode has error.
> # DFSClient removes the second datanode from the pipeline and continue writing with the
remaining datanode(s).
> # The pipeline now has two datanodes but (2) and (3) repeat.
> # Now, only F remains in the pipeline.  DFSClient continues writing with one replica
in F.
> # The write succeeds and DFSClient is able to *close the file successfully*.
> # The block is under replicated.  The NameNode schedules replication from F to some other
datanode D.
> # The replication fails for the same reason.  D reports to the NameNode that the replica
in F is corrupted.
> # The NameNode marks the replica in F is corrupted.
> # The block is corrupted since no replica is available.
> This is a *data loss* scenario.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message