hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhe Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9079) Erasure coding: preallocate multiple generation stamps and serialize updates from data streamers
Date Mon, 12 Oct 2015 20:07:05 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14953652#comment-14953652

Zhe Zhang commented on HDFS-9079:

Thanks for the comments Walter!

It's a very good point that the current patch doesn't handle failures of the streamer threads.
Since the change is already quite large, maybe we can leave that as a separate JIRA, if we
at least agree on the basic direction of this JIRA? I'll try to rev the patch to complete
the handling of DN failures, and try to add some basic handling of streamer thread failures.

I'm currently debugging the patch against {{TestDFSStripedOutputStreamWithFailure}}. I think
the logic of allocating multiple genStamps goes against some assumptions in {{runTest}}. Whenever
I run a single configuration of the below parameter set the test passes (e.g., if I change
{{runTestWithMultipleFailure}} to only test a single entry in {{dnIndexSuite}}). But for multiple
configurations it fails.
private void runTest(final int length, final int[] killPos,
      final int[] dnIndex, final boolean tokenExpire) throws Exception {

> Erasure coding: preallocate multiple generation stamps and serialize updates from data
> ------------------------------------------------------------------------------------------------
>                 Key: HDFS-9079
>                 URL: https://issues.apache.org/jira/browse/HDFS-9079
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: erasure-coding
>    Affects Versions: HDFS-7285
>            Reporter: Zhe Zhang
>            Assignee: Zhe Zhang
>         Attachments: HDFS-9079-HDFS-7285.00.patch, HDFS-9079.01.patch
> A non-striped DataStreamer goes through the following steps in error handling:
> {code}
> 1) Finds error => 2) Asks NN for new GS => 3) Gets new GS from NN => 4) Applies
new GS to DN (createBlockOutputStream) => 5) Ack from DN => 6) Updates block on NN
> {code}
> To simplify the above we can preallocate GS when NN creates a new striped block group
({{FSN#createNewBlock}}). For each new striped block group we can reserve {{NUM_PARITY_BLOCKS}}
GS's. Then steps 1~3 in the above sequence can be saved. If more than {{NUM_PARITY_BLOCKS}}
errors have happened we shouldn't try to further recover anyway.

This message was sent by Atlassian JIRA

View raw message