hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7729) Add logic to DFSOutputStream to support writing a file in striping layout
Date Thu, 12 Feb 2015 21:42:13 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14319034#comment-14319034

Jing Zhao commented on HDFS-7729:

My main concern is that whether we can/should now define DataStreamer as a static class and
move it out of DFSOutputStream. Before, since a DFSOutputStream and a DataStreamer has a one-to-one
mapping, DataStreamer is defined as a non-static internal class and directly accesses/modifies
data fields of DFSOutputStream. A lot of logic is actually mixed together which makes this
part of code hard to follow. Now with more complicated logic from striped blocks, the code
will finally become harder to maintain. It may be better and more clear to make DataStreamer
as a standalone class only handling the logic about transferring packets that is assigned
to it from outside.
And if we look at the current patch we've actually moved a lot of variables from DFSOutputStream
into DataStreamer and also added setter/getter for them. Thus I guess we're already moving
towards this direction. Also moving DataStreamer out can greatly decrease the total lines
of DFSOutputStream and make the code more readable.

[~szetszwo] and I will explore this direction in trunk. After this refactoring, we can extend
the current DFSOutputStream to DFSOutputStreamStriped, which contains the striping logic and
does not support append, hflush, and hsync for now. Maybe in this way we can minimize risk
of breaking logics for non-striping files.

> Add logic to DFSOutputStream to support writing a file in striping layout 
> --------------------------------------------------------------------------
>                 Key: HDFS-7729
>                 URL: https://issues.apache.org/jira/browse/HDFS-7729
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Li Bo
>            Assignee: Li Bo
>         Attachments: Codec-tmp.patch, HDFS-7729-001.patch, HDFS-7729-002.patch, HDFS-7729-003.patch,
HDFS-7729-004.patch, HDFS-7729-005.patch, HDFS-7729-006.patch, HDFS-7729-007.patch, HDFS-7729-008.patch
> If client wants to directly write a file striping layout, we need to add some logic to
DFSOutputStream.  DFSOutputStream needs multiple DataStreamers to write each cell of a stripe
to a remote datanode. 

This message was sent by Atlassian JIRA

View raw message