hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9040) Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests to Coordinator)
Date Tue, 15 Sep 2015 23:02:46 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14746443#comment-14746443

Jing Zhao commented on HDFS-9040:

Thanks for the great review, Walter and Zhe!

bq. Speaking blockToken, it reminds me another severe issue.

Yes this can be an issue and we should fix it. But at this stage it may not be that severe:
the block token default life time (600 min) should be long enough to cover normal writing
scenario. Also slow writer may not be our main use case in phase I, especially considering
we do not support hflush/hsync now so HBase cannot use EC files yet. Creating streams before
having real data can be a good idea. Maybe we create a jira for this?

bq. Since we have agreed to move the locateFollowingBlock logic to OutputStream level, we
should limit the lifespan of a StripedDataStreamer to a single block.

This is a good point. In my current patch only failed streamers are replaced when writing
a new block. To replace all the streamers can be even simpler. My only concern is the workload
of creating new threads.

bq. We can also consider refactoring the base DataStreamer class into BlockDataStreamer

Maybe we can do the refactoring after merging EC feature into trunk? Before the merging we
may want to minimize the changes related to the original writing pipeline.

I will upload a new patch soon to fix race conditions pointed by Walter.

> Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests to Coordinator)
> -------------------------------------------------------------------------------------------
>                 Key: HDFS-9040
>                 URL: https://issues.apache.org/jira/browse/HDFS-9040
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Walter Su
>         Attachments: HDFS-9040-HDFS-7285.002.patch, HDFS-9040.00.patch, HDFS-9040.001.wip.patch,
> The general idea is to simplify error handling logic.
> Proposal 1:
> A BlockGroupDataStreamer to communicate with NN to allocate/update block, and StripedDataStreamer
s only have to stream blocks to DNs.
> Proposal 2:
> See below the [comment|https://issues.apache.org/jira/browse/HDFS-9040?focusedCommentId=14741388&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14741388]
from [~jingzhao].

This message was sent by Atlassian JIRA

View raw message