hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhe Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7653) Block Readers and Writers used in both client side and datanode side
Date Thu, 22 Jan 2015 23:22:36 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288433#comment-14288433
] 

Zhe Zhang commented on HDFS-7653:
---------------------------------

Thanks Bo for the patch. Please find my review below:

About the structure:
# Why do we need a {{UnifiedBlockReader}} interface, when {{BlockReader}} contains all the
required methods?
# {{BlockReader}}:
#* How do you plan to use {{ClientBlockReader}} in {{DFSInputStream}}? Will it replace the
current {{blockReader}}?
#* The purpose of a block reader is to read a single block from a single DataNode. Instead
of changing that logic, I think we need to start multiple block readers and coordinate them,
similar to the [design | https://issues.apache.org/jira/secure/attachment/12687886/DataStripingSupportinHDFSClient.pdf]
in HDFS-7545.
# {{BlockWriter}}:
#* The {{FSOutputSummer}} class is for _file_ output streams. I don't think it's appropriate
as a block writer
#* In particular, as I mentioned in a [comment | https://issues.apache.org/jira/browse/HDFS-7344?focusedCommentId=14273774&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14273774]
under HDFS-7344, I think block transfers between DataNodes can be much simpler than client-side
pipelines. Unless there's an obvious advantage of streaming data to peer DataNodes in-the-fly,
I don't think we should introduce the complex write pipeline logic to DN.
# I think _what can be shared_ between client and DN is the logic to coordinate multiple reading
or writing threads.
#* We can refer to how [Reader | https://github.com/quantcast/qfs/blob/master/src/cc/libclient/Reader.cc]
and [Writer | https://github.com/quantcast/qfs/blob/master/src/cc/libclient/Writer.cc] share
the [RSSriper | https://github.com/quantcast/qfs/blob/master/src/cc/libclient/RSStriper.cc]
logic in QFS. 
#* To make things simpler we can even start from developing client and DN code separately,
and abstract out common logic later on.

Nits:
# {{ClientBlockReader#read(ByteBuffer buf)}}: what's the purpose of the following code? When
will a partially-used buffer be passed in?
{code}
    int remaining = buf.remaining();
    if(remaining <= 0)
      return 0;
    byte[] b = new byte[remaining];
    int r = read(b, 0, b.length);
    for(int i = 0; i < r; i++)
      buf.put(b[i]);
{code}
# DatanodeBlockReaderTest:
#* Name should follow the convention and start with Test
#* It doesn't compile; need to change {{MiniDFSCluster.getBlockFile}} to {{cluster.getBlockFile}}

> Block Readers and Writers used in both client side and datanode side
> --------------------------------------------------------------------
>
>                 Key: HDFS-7653
>                 URL: https://issues.apache.org/jira/browse/HDFS-7653
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Li Bo
>            Assignee: Li Bo
>         Attachments: BlockReadersWriters.patch
>
>
> There're a lot of block read/write operations in HDFS-EC, for example, when client writes
a file in striping layout, client has to write several blocks to several different datanodes;
if a datanode wants to do an encoding/decoding task, it has to read several blocks from itself
and other datanodes, and writes one or more blocks to itself or other datanodes.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message