hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhe Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7545) Data striping support in HDFS client
Date Mon, 02 Feb 2015 19:44:34 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14301765#comment-14301765
] 

Zhe Zhang commented on HDFS-7545:
---------------------------------

Thanks Bo for the good work! The high level logic looks good.

Nits:
# We should avoid unnecessarily moving lines, such as {{private long currentSeqno = 0;}}
# Eclipse tries to automatically combine import entries, like below. We should avoid introducing
wildcard imports unnecessarily.
{code}
-import org.apache.hadoop.fs.CanSetDropBehind;
-import org.apache.hadoop.fs.CreateFlag;
-import org.apache.hadoop.fs.FSOutputSummer;
-import org.apache.hadoop.fs.FileAlreadyExistsException;
-import org.apache.hadoop.fs.FileEncryptionInfo;
-import org.apache.hadoop.fs.ParentNotDirectoryException;
-import org.apache.hadoop.fs.Syncable;
+import org.apache.hadoop.fs.*;
{code}
# {{stripeLayout}} should be {{stripedLayout}} or {{stripingLayout}}
# Typo: {{// if append , should read some packet ahean}}
# 3 blank lines at 2113~2115; double blank lines at other places.
# Some lines are too long (e.g. 2085). All lines should be within 80 chars.

Logics:
# Since the {{org.apache.hadoop.hdfs.ec}} package is still being developed, let's use a "dummy"
encode function in this patch, to make it functional (right now it doesn't compile).
#* In cases like this, we can add a {{//TODO}} item with the JIRA number
# We need a unit testing class
# NameNode patch is already in. Could you update accordingly?
#* Client still gets a {{LocatedBlock}} from NN, whose ID is the first block ID in the group.
#* From the above, the client calculates the IDs of remaining blocks in the group
# {{writeChunk}} has the logic of striping input data into the list of {{dataQueue}}s. This
is fairly complex and we need to clearly document it in the comments
# Is {{cellBuffers}} only used for parity calculation? It needs some documentation too.
# [Question] How large is each {{Packet}} in {{DFSOutputStream}}? I haven't read that part
of code in detail.
# The below handles queuing parity data packets. Have you tested it? Seems to me it will set
{{currentPacket}} to {{null}} and we'll lose the last data packet before the parity one. What's
reason of using {{queueCurrentPacket}} instead of {{waitAndQueueCurrentPacket}}?
{code}
        if (curIdx == blockGroupDataBlocks) {
          encode(cellBuffers);
          for (int i = blockGroupDataBlocks; i < blockGroupSize; i++) {
            ByteBuffer parityBuffer = cellBuffers[i];
            parityBuffer.flip();
            List<Packet> packets = generatePackets(parityBuffer);
            for (Packet p : packets) {
              currentPacket = p;
              queueCurrentPacket();
            }
{code}


> Data striping support in HDFS client
> ------------------------------------
>
>                 Key: HDFS-7545
>                 URL: https://issues.apache.org/jira/browse/HDFS-7545
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Zhe Zhang
>            Assignee: Li Bo
>         Attachments: DataStripingSupportinHDFSClient.pdf, HDFS-7545-001-DFSOutputStream.patch,
HDFS-7545-PoC.patch, clientStriping.patch
>
>
> Data striping is a commonly used data layout with critical benefits in the context of
erasure coding. This JIRA aims to extend HDFS client to work with striped blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message