hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brahma Reddy Battula <brahmareddy.batt...@huawei.com>
Subject Issue in handling checksum errors in write pipeline
Date Sat, 30 Jul 2016 11:03:59 GMT

We had come across one issue, where write is failed even 7 DN's are available due to network
fault at one datanode which is LAST_IN_PIPELINE. It will be similar to HDFS-6937 .

Scenario : (DN3 has N/W Fault and Min repl=2).

Write pipeline:
DN1->DN2->DN3  => DN3 Gives ERROR_CHECKSUM ack. And so DN2 marked as bad
DN1->DN4-> DN3 => DN3 Gives ERROR_CHECKSUM ack. And so DN4 is marked as bad
And so on ( all the times DN3 is LAST_IN_PIPELINE) ... Continued till no more datanodes to
construct the pipeline.

Thinking we can handle like below:

Instead of throwing IOException for ERROR_CHECKSUM ack from downstream, If we can send back
the pipeline ack and client side we can replace both DN2 and DN3 with new nodes as we can't
decide on which is having network problem.

Please give you views the possible fix..

--Brahma Reddy Battula

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message