hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhe Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9040) Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests to Coordinator)
Date Mon, 28 Sep 2015 20:07:05 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933899#comment-14933899
] 

Zhe Zhang commented on HDFS-9040:
---------------------------------

I think the latest patch looks pretty good -- thanks Jing for the great work! A few comments
below. Most of them can be addressed separately. If we all agree upon the direction of HDFS-9079
I'm happy to make the changes there too.

* When writing the first block in the file, or if the streamer is the fastest to finish a
block, {{followingBlocks}} might not be ready when the below is reached. For example, if the
RPC call {{addBlock}} is slow, or when the client has a delay between writing the last chunk
of block_0 and the first chunk of block_1. Should we {{take}} instead of {{poll}}?
{code}
  /**
   * The upper level DFSStripedOutputStream will allocate the new block group.
   * All the striped data streamer only needs to fetch from the queue, which
   * should be already be ready.
   */
  private LocatedBlock getFollowingBlock() throws IOException {
{code}
 * The rest of the error-handling logics looks good. {{writeChunk}} => {{checkStreamerFailures}}
is the key sync point here. I agree we should let this JIRA focus on the main logic and dedicate
HDFS-9098 to testing.

Nits:
* {{callUpdatePipeline}} can now be folded into {{updatePipeline}}
* {{updatePipelineInternal}} is not an "internal" method of {{updatePipeline}}, maybe {{setupPipelineInternal}}?

Long-term:
* The current subclassing structure of {{DFSOutputStream}} and {{DataStreamer}} is not ideal.
The striped subclasses are inheriting some unnecessary complexities. Meanwhile we need to
add hooks in the superclass which only make sense for the striped subclass. We can think about
separating out a real super class for both contiguous and striped output logics.

+1 pending a clarification of the first comment.

> Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests to Coordinator)
> -------------------------------------------------------------------------------------------
>
>                 Key: HDFS-9040
>                 URL: https://issues.apache.org/jira/browse/HDFS-9040
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Walter Su
>            Assignee: Jing Zhao
>         Attachments: HDFS-9040-HDFS-7285.002.patch, HDFS-9040-HDFS-7285.003.patch, HDFS-9040-HDFS-7285.004.patch,
HDFS-9040-HDFS-7285.005.patch, HDFS-9040-HDFS-7285.005.patch, HDFS-9040-HDFS-7285.006.patch,
HDFS-9040.00.patch, HDFS-9040.001.wip.patch, HDFS-9040.02.bgstreamer.patch
>
>
> The general idea is to simplify error handling logic.
> -Proposal 1:-
> -A BlockGroupDataStreamer to communicate with NN to allocate/update block, and StripedDataStreamer
s only have to stream blocks to DNs.-
> Proposal 2:
> See below the [comment|https://issues.apache.org/jira/browse/HDFS-9040?focusedCommentId=14741388&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14741388]
from [~jingzhao].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message