hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhe Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7889) Subclass DFSOutputStream to support writing striping layout files
Date Mon, 30 Mar 2015 18:56:53 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387204#comment-14387204
] 

Zhe Zhang commented on HDFS-7889:
---------------------------------

Thanks Bo for the revised patch!

I think the main changes to {{DFSOutputStream}} are:
# Change some fields and methods from private to protected
# {{streamer}} field used to be accessed directly; now added a getter {{getStreamer()}}
# Refactor: separate out 2 methods in {{writeChunkImpl}}: {{adjustChunkBoundary}} and {{encounterBlockBoundary}}
# Refactor: separate out a {{getFileStatus}} method
# Adds logic to {{newStreamForCreate}} and {{newStreamForAppend}} to create striped output
streams based on file status.

Looks to me 1~4 can go into trunk under HDFS-7888. Regarding 5, since you already have an
abstract {{getFileStatus}} method, maybe it's better to do that checking in {{DFSClient}}?

The main logics in {{DFSStripedOutputStream}} and {{StripedDataStreamer}} look pretty good
to me overall. Some detailed issues to address:
# It's probably OK to pad zero-bytes now, but I think we should eventually get rid of it.
Instead of actually writing zero-bytes to the DNs hosting raw data blocks, we should just
flush the parity data. When we do that we can also get rid of {{bytesCurBlockGroup}}.
# 2nd constructor of {{StripedDataStreamer}} is not needed
# {{InterruptedException}} should be properly handled in {{locateFollowingBlock}}
# Array of generic objects is against Java best practice I think.
{code}
BlockingQueue<LocatedBlock>[] stripeBlocks = new BlockingQueue[blockGroupSize];
{code}
How about checking type when accessing stripedBlock elements, or just use an ArrayList? 
{code}
BlockingQueue<\?>[] stripeBlocks = new BlockingQueue[blockGroupSize];
...
assert stripedBlocks[i] instanceof LocatedBlock
Access stripedBlocks[i]
{code}

Nits:
# Typo in {{DFSOutputStream}}: clazzName
# In {{DFSStripedOutputStream}}: {{MaxBlockInQueue}} should be {{MaxBlocksInQueue}}


> Subclass DFSOutputStream to support writing striping layout files
> -----------------------------------------------------------------
>
>                 Key: HDFS-7889
>                 URL: https://issues.apache.org/jira/browse/HDFS-7889
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Li Bo
>            Assignee: Li Bo
>         Attachments: HDFS-7889-001.patch, HDFS-7889-002.patch, HDFS-7889-003.patch, HDFS-7889-004.patch
>
>
> After HDFS-7888, we can subclass  {{DFSOutputStream}} to support writing striping layout
files. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message