hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhe Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9079) Erasure coding: preallocate multiple generation stamps and serialize updates from data streamers
Date Mon, 28 Sep 2015 07:56:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14910134#comment-14910134
] 

Zhe Zhang commented on HDFS-9079:
---------------------------------

Thanks for the helpful comment Walter.

bq. setupPipelineForAppendOrRecovery() will trim bad nodes. When nodes.length==0, the failed
streamer won't call updateBlockForPipeline(). That's one reason you need HDFS-9040.
Agreed, the overridden {{updatePipelineInternal}} logic in HDFS-9040 will address this issue.
As explained above this patch will be rebased on top of HDFS-9040 once HDFS-9040 is committed.

bq. The new way delays updatePipeline. One failure doesn't call it, only endBlock() will.
The code segment in {{case DN_ACCEPT_GS}} also updates the NN copy of the block (storedBlock).
The protocol is that once all healthy DNs accept the proposed GS, we update NN (also described
[here | https://issues.apache.org/jira/browse/HDFS-9040?focusedCommentId=14741972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14741972].
This guarantees no "false-stale", meaning a fresh internal block will never be considered
stale. But fundamentally it's hard to prevent "false-fresh". We can only try to shorten the
window where a stale internal block can be considered fresh.

bq. Assume client gets killed before endBlock(). Now every blocks can be accepted by blockReport.
It affects lease recovery's judgement. 
My plan is to bump the GS of NN's storedBlock to 1004 (1001+NUM_PARITY_BLOCKS) in lease recovery.
A healthy streamer also bumps the GS of its internal block (DN's copy of GS) to 1005 when
successfully finishing writing the internal block.

bq. updatePipeline() is called when overlapping failures finally get handled, or just before
endBlock()? 
See above, it's called "when overlapping failures finally get handled". It's will also be
called during {{endBlock}}.

> Erasure coding: preallocate multiple generation stamps and serialize updates from data
streamers
> ------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-9079
>                 URL: https://issues.apache.org/jira/browse/HDFS-9079
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: HDFS-7285
>            Reporter: Zhe Zhang
>            Assignee: Zhe Zhang
>         Attachments: HDFS-9079-HDFS-7285.00.patch
>
>
> A non-striped DataStreamer goes through the following steps in error handling:
> {code}
> 1) Finds error => 2) Asks NN for new GS => 3) Gets new GS from NN => 4) Applies
new GS to DN (createBlockOutputStream) => 5) Ack from DN => 6) Updates block on NN
> {code}
> To simplify the above we can preallocate GS when NN creates a new striped block group
({{FSN#createNewBlock}}). For each new striped block group we can reserve {{NUM_PARITY_BLOCKS}}
GS's. Then steps 1~3 in the above sequence can be saved. If more than {{NUM_PARITY_BLOCKS}}
errors have happened we shouldn't try to further recover anyway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message