hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhe Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7545) Data striping support in HDFS client
Date Mon, 02 Feb 2015 19:44:34 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14301765#comment-14301765

Zhe Zhang commented on HDFS-7545:

Thanks Bo for the good work! The high level logic looks good.

# We should avoid unnecessarily moving lines, such as {{private long currentSeqno = 0;}}
# Eclipse tries to automatically combine import entries, like below. We should avoid introducing
wildcard imports unnecessarily.
-import org.apache.hadoop.fs.CanSetDropBehind;
-import org.apache.hadoop.fs.CreateFlag;
-import org.apache.hadoop.fs.FSOutputSummer;
-import org.apache.hadoop.fs.FileAlreadyExistsException;
-import org.apache.hadoop.fs.FileEncryptionInfo;
-import org.apache.hadoop.fs.ParentNotDirectoryException;
-import org.apache.hadoop.fs.Syncable;
+import org.apache.hadoop.fs.*;
# {{stripeLayout}} should be {{stripedLayout}} or {{stripingLayout}}
# Typo: {{// if append , should read some packet ahean}}
# 3 blank lines at 2113~2115; double blank lines at other places.
# Some lines are too long (e.g. 2085). All lines should be within 80 chars.

# Since the {{org.apache.hadoop.hdfs.ec}} package is still being developed, let's use a "dummy"
encode function in this patch, to make it functional (right now it doesn't compile).
#* In cases like this, we can add a {{//TODO}} item with the JIRA number
# We need a unit testing class
# NameNode patch is already in. Could you update accordingly?
#* Client still gets a {{LocatedBlock}} from NN, whose ID is the first block ID in the group.
#* From the above, the client calculates the IDs of remaining blocks in the group
# {{writeChunk}} has the logic of striping input data into the list of {{dataQueue}}s. This
is fairly complex and we need to clearly document it in the comments
# Is {{cellBuffers}} only used for parity calculation? It needs some documentation too.
# [Question] How large is each {{Packet}} in {{DFSOutputStream}}? I haven't read that part
of code in detail.
# The below handles queuing parity data packets. Have you tested it? Seems to me it will set
{{currentPacket}} to {{null}} and we'll lose the last data packet before the parity one. What's
reason of using {{queueCurrentPacket}} instead of {{waitAndQueueCurrentPacket}}?
        if (curIdx == blockGroupDataBlocks) {
          for (int i = blockGroupDataBlocks; i < blockGroupSize; i++) {
            ByteBuffer parityBuffer = cellBuffers[i];
            List<Packet> packets = generatePackets(parityBuffer);
            for (Packet p : packets) {
              currentPacket = p;

> Data striping support in HDFS client
> ------------------------------------
>                 Key: HDFS-7545
>                 URL: https://issues.apache.org/jira/browse/HDFS-7545
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Zhe Zhang
>            Assignee: Li Bo
>         Attachments: DataStripingSupportinHDFSClient.pdf, HDFS-7545-001-DFSOutputStream.patch,
HDFS-7545-PoC.patch, clientStriping.patch
> Data striping is a commonly used data layout with critical benefits in the context of
erasure coding. This JIRA aims to extend HDFS client to work with striped blocks.

This message was sent by Atlassian JIRA

View raw message