hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhe Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7653) Block Readers and Writers used in both client side and datanode side
Date Thu, 22 Jan 2015 23:22:36 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288433#comment-14288433

Zhe Zhang commented on HDFS-7653:

Thanks Bo for the patch. Please find my review below:

About the structure:
# Why do we need a {{UnifiedBlockReader}} interface, when {{BlockReader}} contains all the
required methods?
# {{BlockReader}}:
#* How do you plan to use {{ClientBlockReader}} in {{DFSInputStream}}? Will it replace the
current {{blockReader}}?
#* The purpose of a block reader is to read a single block from a single DataNode. Instead
of changing that logic, I think we need to start multiple block readers and coordinate them,
similar to the [design | https://issues.apache.org/jira/secure/attachment/12687886/DataStripingSupportinHDFSClient.pdf]
in HDFS-7545.
# {{BlockWriter}}:
#* The {{FSOutputSummer}} class is for _file_ output streams. I don't think it's appropriate
as a block writer
#* In particular, as I mentioned in a [comment | https://issues.apache.org/jira/browse/HDFS-7344?focusedCommentId=14273774&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14273774]
under HDFS-7344, I think block transfers between DataNodes can be much simpler than client-side
pipelines. Unless there's an obvious advantage of streaming data to peer DataNodes in-the-fly,
I don't think we should introduce the complex write pipeline logic to DN.
# I think _what can be shared_ between client and DN is the logic to coordinate multiple reading
or writing threads.
#* We can refer to how [Reader | https://github.com/quantcast/qfs/blob/master/src/cc/libclient/Reader.cc]
and [Writer | https://github.com/quantcast/qfs/blob/master/src/cc/libclient/Writer.cc] share
the [RSSriper | https://github.com/quantcast/qfs/blob/master/src/cc/libclient/RSStriper.cc]
logic in QFS. 
#* To make things simpler we can even start from developing client and DN code separately,
and abstract out common logic later on.

# {{ClientBlockReader#read(ByteBuffer buf)}}: what's the purpose of the following code? When
will a partially-used buffer be passed in?
    int remaining = buf.remaining();
    if(remaining <= 0)
      return 0;
    byte[] b = new byte[remaining];
    int r = read(b, 0, b.length);
    for(int i = 0; i < r; i++)
# DatanodeBlockReaderTest:
#* Name should follow the convention and start with Test
#* It doesn't compile; need to change {{MiniDFSCluster.getBlockFile}} to {{cluster.getBlockFile}}

> Block Readers and Writers used in both client side and datanode side
> --------------------------------------------------------------------
>                 Key: HDFS-7653
>                 URL: https://issues.apache.org/jira/browse/HDFS-7653
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Li Bo
>            Assignee: Li Bo
>         Attachments: BlockReadersWriters.patch
> There're a lot of block read/write operations in HDFS-EC, for example, when client writes
a file in striping layout, client has to write several blocks to several different datanodes;
if a datanode wants to do an encoding/decoding task, it has to read several blocks from itself
and other datanodes, and writes one or more blocks to itself or other datanodes.  

This message was sent by Atlassian JIRA

View raw message