hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinayakumar B (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-10714) Issue in handling checksum errors in write pipeline when fault DN is LAST_IN_PIPELINE
Date Tue, 06 Sep 2016 14:37:20 GMT

     [ https://issues.apache.org/jira/browse/HDFS-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vinayakumar B updated HDFS-10714:
---------------------------------
    Attachment: HADOOP-10714-01-draft.patch

Here is the Initial approach, based on the #2 mentioned by [~brahmareddy].

1. DN1->DN2->DN3 is the pipeline,
2. DN3 will get ChecksumException, and Sends the CHECKSUM Error Ack upstream and shuts itself
down.
3. DN2 will receive the Ack, and before sending upstream, verifies its local replica's checksum.
4. If DN2 also found checksum error, then possibly DN1 also would have error. So DN2 also
marks itself CHECKSUM_ERROR, and sends the reply upstream and shuts itself down.

So in this way all DNs replicas will be verified before Ack reaches client.

Please review and give suggestions.

> Issue in handling checksum errors in write pipeline when fault DN is LAST_IN_PIPELINE
> -------------------------------------------------------------------------------------
>
>                 Key: HDFS-10714
>                 URL: https://issues.apache.org/jira/browse/HDFS-10714
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Brahma Reddy Battula
>            Assignee: Brahma Reddy Battula
>
> We had come across one issue, where write is failed even 7 DN’s are available due to
network fault at one datanode which is LAST_IN_PIPELINE. It will be similar to HDFS-6937 .
> Scenario : (DN3 has N/W Fault and Min repl=2).
> Write pipeline:
> DN1->DN2->DN3  => DN3 Gives ERROR_CHECKSUM ack. And so DN2 marked as bad
> DN1->DN4-> DN3 => DN3 Gives ERROR_CHECKSUM ack. And so DN4 is marked as bad
> ….
> And so on ( all the times DN3 is LAST_IN_PIPELINE) ... Continued till no more datanodes
to construct the pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message