Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Tue, 15 Sep 2015 23:02:46 +0000 (UTC)
From: "Jing Zhao (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12862620.1441763523000.348183.1442358166430@Atlassian.JIRA>
In-Reply-To: <JIRA.12862620.1441763523000@Atlassian.JIRA>
References: <JIRA.12862620.1441763523000@Atlassian.JIRA>
 <JIRA.12862620.1441763523795@arcas>
Subject: [jira] [Commented] (HDFS-9040) Erasure coding: Refactor
 DFSStripedOutputStream (Move Namenode RPC Requests to Coordinator)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HDFS-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14746443#comment-14746443 ] 

Jing Zhao commented on HDFS-9040:
---------------------------------

Thanks for the great review, Walter and Zhe!

bq. Speaking blockToken, it reminds me another severe issue.

Yes this can be an issue and we should fix it. But at this stage it may not be that severe: the block token default life time (600 min) should be long enough to cover normal writing scenario. Also slow writer may not be our main use case in phase I, especially considering we do not support hflush/hsync now so HBase cannot use EC files yet. Creating streams before having real data can be a good idea. Maybe we create a jira for this?

bq. Since we have agreed to move the locateFollowingBlock logic to OutputStream level, we should limit the lifespan of a StripedDataStreamer to a single block.

This is a good point. In my current patch only failed streamers are replaced when writing a new block. To replace all the streamers can be even simpler. My only concern is the workload of creating new threads.

bq. We can also consider refactoring the base DataStreamer class into BlockDataStreamer

Maybe we can do the refactoring after merging EC feature into trunk? Before the merging we may want to minimize the changes related to the original writing pipeline.

I will upload a new patch soon to fix race conditions pointed by Walter.

> Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests to Coordinator)
> -------------------------------------------------------------------------------------------
>
>                 Key: HDFS-9040
>                 URL: https://issues.apache.org/jira/browse/HDFS-9040
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Walter Su
>         Attachments: HDFS-9040-HDFS-7285.002.patch, HDFS-9040.00.patch, HDFS-9040.001.wip.patch, HDFS-9040.02.bgstreamer.patch
>
>
> The general idea is to simplify error handling logic.
> Proposal 1:
> A BlockGroupDataStreamer to communicate with NN to allocate/update block, and StripedDataStreamer s only have to stream blocks to DNs.
> Proposal 2:
> See below the [comment|https://issues.apache.org/jira/browse/HDFS-9040?focusedCommentId=14741388&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14741388] from [~jingzhao].


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)