hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mingliang Liu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7661) Erasure coding: support hflush and hsync
Date Thu, 18 Feb 2016 04:27:18 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15151729#comment-15151729

Mingliang Liu commented on HDFS-7661:

Thanks [~drankye] and [~sinall] for your prompt comments.

As I thought about the v2 design of this feature, I finally realized this is far from a simple
patch. I totally welcome to join the collaborate effort of speeding it up as this is of high
priority among other EC sub-tasks (as discussed in [HDFS-9603]).

I agree with [~drankye] that we need settle down the design and approach first. After this
is well discussed, we can separate the work in different small components. For write it includes:
* client side hflush (mainly in {{DFSStripedOutputStream#flushOrSync}})
* DN receiving the packets (mainly in {{BlockReceiver#receivePacket}})
* DN appending, committing (and parsing for read) the BG length to meta file
* Fsdataset operations in DN to support file overwrite (as commented by [~sinall])

Meanwhile, for read request, 
* parity DN calculating its safe length
* the client side computing the maximum visible BG length
* and the protocol in-between, e.g. the DN may be aware of the EC policy for calculating its
safe length.

Moreover, block reconstruction also needs to support {{hflush}}-ed files, which is not yet
covered by current design.

Last, we need to test the code thoroughly. The ideal case is that we are able to test each
code segment independently, without involving too much context of other part. End-to-end test
is needed for sure when we bring them together.

I must have missed something, I believe? As we discuss the design, my in-progress demo patch
mainly focuses on the client {{DFSStripedOutputStream#flushOrSync}} and DN {{BlockReceiver#receivePacket}}.
Thus [~sinall]'s code on overwriting support in fsdataset should be re-used. For read request,
I don't have any code yet.

> Erasure coding: support hflush and hsync
> ----------------------------------------
>                 Key: HDFS-7661
>                 URL: https://issues.apache.org/jira/browse/HDFS-7661
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: GAO Rui
>         Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, HDFS-7661-unitTest-wip-trunk.patch,
HDFS-EC-file-flush-sync-design-version1.1.pdf, HDFS-EC-file-flush-sync-design-version2.0.pdf
> We also need to support hflush/hsync and visible length. 

This message was sent by Atlassian JIRA

View raw message