hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhe Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8254) In StripedDataStreamer, it is hard to tolerate datanode failure in the leading streamer
Date Fri, 29 May 2015 00:28:33 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563982#comment-14563982
] 

Zhe Zhang commented on HDFS-8254:
---------------------------------

Thanks Nicholas for the patch! Our initial design put {{locateFollowingBlock}} logic in lead
streamer for simplicity. I think it's a great idea to remove that single point of failure.

# The new {{ConcurrentPoll}} class looks good overall. Let me know if this understanding is
correct: now the fastest streamer will take care of allocating block group from NN and distributing
to other streamers. Can we add some Javadocs for the class and methods, and ideally, a design
description on the JIRA?
# The concurrent logic is a little complex and some parts could be fragile. For example, the
{{populate}} method in {{locateFollowingBlock}} directly changes the {{block}} of the class.
It's true that {{locateFollowingBlock}} is only used by {{nextBlockOutputStream}}, which will
reassign a correct value for {{block}}. But this dependency makes {{locateFollowingBlock}}
less self-contained. It also looks like we could run into a race condition if 2 streamers
enter {{locateFollowingBlock}} around the same time? They could both pass {{isReady2Populate}}
before either one has started taking from {{endBlocks}}. Since {{DataStreamer#locateFollowingBlock}}
is not complex, can we do some refactoring and move it to the input stream level? This way
the {{coordinator}} can take care of the main logic and the fastest streamer just has to trigger
it. I haven't thought through {{updateBlockForPipeline}} and {{updatePipeline}} yet but I
guess the stories should be similar.

Nits:
# {{DFSStripedOutputStream}} already has the schema but the streamers are still using constants.
We should either use the schema or at least add some TODOs.
# New methods in {{DataStreamer}} could use some Javadoc.
# {{class Coordinator}} could be renamed to something like {{StreamersCoordinator}}, just
to be more specific.
# The Javadoc of {{StripedBlockUtil#checkBlocks}} should say that it checks the two blocks
are in the same block group

> In StripedDataStreamer, it is hard to tolerate datanode failure in the leading streamer
> ---------------------------------------------------------------------------------------
>
>                 Key: HDFS-8254
>                 URL: https://issues.apache.org/jira/browse/HDFS-8254
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: Tsz Wo Nicholas Sze
>         Attachments: h8254_20150526.patch, h8254_20150526b.patch
>
>
> StripedDataStreamer javadoc is shown below.
> {code}
>  * The StripedDataStreamer class is used by {@link DFSStripedOutputStream}.
>  * There are two kinds of StripedDataStreamer, leading streamer and ordinary
>  * stream. Leading streamer requests a block group from NameNode, unwraps
>  * it to located blocks and transfers each located block to its corresponding
>  * ordinary streamer via a blocking queue.
> {code}
> Leading streamer is the streamer with index 0.  When the datanode of the leading streamer
fails, the other steamers cannot continue since no one will request a block group from NameNode
anymore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message