hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6937) Another issue in handling checksum errors in write pipeline
Date Mon, 25 Aug 2014 21:52:59 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14109808#comment-14109808
] 

Yongjun Zhang commented on HDFS-6937:
-------------------------------------

Hi [~cmccabe],

Thanks for the comments and addition. 

As proposed in the jira report, we need to have a way to communicate the DN3's status to DN2,
I think DN3 even can tell DN2 where the checksum error happens, then DN2 can do a checksum
checking, if the checksum happens at the same location, that means DN3 is not the culprit,
so we should not take out DN3.  If DN2 is good, then DN3 is the culprit and we reconstruct
the pipeline without DN3.

If DN2's data is corrupted, DN2 can do the same by propagating back the checksum error info
back to DN1, DN1 then do a checksum checking too and confirm whether itself has good data.
If DN1's data is also corrupted, then DN1 is the one to take out. And the client need to reconstruct
the pipeline by just throwing away DN1, and keep DN2 and DN3.

If DN1 is good, then the issue is at DN2.

This way, we minimize the impact of unnecessarily throwing away good DNs. We need to be aware
of that, the corrupted chunks were already acknowledged previously to the client. Client better
buffer the data block and be able to rewrite the whole block to the newly constructed pipeline.

We need to have an infrastructure to communicate back error status to upstream DNs instead
of just terminating, so we don't have to incur this run time cost most of the time (assuming
checksum error happens infrequently).

I'd prefer working out a functioning solution for the current issue, and have new jira to
handle corner cases which were not handled previously.


> Another issue in handling checksum errors in write pipeline
> -----------------------------------------------------------
>
>                 Key: HDFS-6937
>                 URL: https://issues.apache.org/jira/browse/HDFS-6937
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, hdfs-client
>    Affects Versions: 2.5.0
>            Reporter: Yongjun Zhang
>            Assignee: Colin Patrick McCabe
>
> Given a write pipeline:
> DN1 -> DN2 -> DN3
> DN3 detected cheksum error and terminate, DN2 truncates its replica to the ACKed size.
Then a new pipeline is attempted as
> DN1 -> DN2 -> DN4
> DN4 detects checksum error again. Later when replaced DN4 with DN5 (and so on), it failed
for the same reason. This led to the observation that DN2's data is corrupted. 
> Found that the software currently truncates DN2's replca to the ACKed size after DN3
terminates. But it doesn't check the correctness of the data already written to disk.
> So intuitively, a solution would be, when downstream DN (DN3 here) found checksum error,
propagate this info back to upstream DN (DN2 here), DN2 checks the correctness of the data
already written to disk, and truncate the replica to to MIN(correctDataSize, ACKedSize).
> Found this issue is similar to what was reported by HDFS-3875, and the truncation at
DN2 was actually introduced as part of the HDFS-3875 solution. 
> Filing this jira for the issue reported here. HDFS-3875 was filed by [~tlipcon]
> and found he proposed something similar there.
> {quote}
> if the tail node in the pipeline detects a checksum error, then it returns a special
error code back up the pipeline indicating this (rather than just disconnecting)
> if a non-tail node receives this error code, then it immediately scans its own block
on disk (from the beginning up through the last acked length). If it detects a corruption
on its local copy, then it should assume that it is the faulty one, rather than the downstream
neighbor. If it detects no corruption, then the faulty node is either the downstream mirror
or the network link between the two, and the current behavior is reasonable.
> {quote}
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message