hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yi Liu (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-8033) Erasure coding: stateful (non-positional) read from files in striped layout
Date Tue, 21 Apr 2015 06:33:59 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504428#comment-14504428
] 

Yi Liu edited comment on HDFS-8033 at 4/21/15 6:33 AM:
-------------------------------------------------------

Thanks [~zhz] for working on this.  The patch is good, my comments:
*1.*  In DFSInputStream, the stateful read is not to read fully for the output *buf*,  {{readWithStrategy}}
will call {{readBuffer}} and return on success.  In {{DFSStripedInputStream}} we override
{{readBuffer}}, but we only read in one striped block, so the returned result should be something
like (cell_0, cell_3, ....) and it only contains part of the expected data. 
This is not incorrect,  in the test, you have tested stateful read, but you do fully read
and the data size is *BLOCK_GROUP_SIZE*, so the result coincidentally is correct. 
I suggest we try to do fully read in {{readBuffer}} of {{DFSStripedInputStream}} unless we
find the end of file, of course, the final read length could be less than the input buf length
if we get eof.

*2.* In {{blockSeekTo}}, we need to handle refetchToken and refetchEncryptionKey. And for
other IOException, we can throw it.

*3.*  For the test, do stateful read: read once and fully read (please make the data size
large than groupSize * cellSize), as I said in #1,

*4.*  {{connectFailedOnce}} in {{blockSeekTo}} is not necessary.

*5.*  Why you modify {{SimulatedFSDataset}}?


was (Author: hitliuyi):
Thanks [~zhz] for working on this.  The patch is good, my comments:
*1.*  In DFSInputStream, the stateful read is not to read fully for the output *buf*,  {{readWithStrategy}}
will call {{readBuffer}} and return on success.  In {{DFSStripedInputStream}} we override
{{readBuffer}}, but we only read in one striped block, so the returned result should be something
like (cell_0, cell_3, ....).  
This is not incorrect,  in the test, you have tested stateful read, but you do fully read
and the data size is *BLOCK_GROUP_SIZE*, so the result coincidentally is correct. 
I suggest we try to do fully read in {{readBuffer}} of {{DFSStripedInputStream}} unless we
find the end of file, of course, the final read length could be less than the input buf length
if we get eof.

*2.* In {{blockSeekTo}}, we need to handle refetchToken and refetchEncryptionKey. And for
other IOException, we can throw it.

*3.*  For the test, do stateful read: read once and fully read (please make the data size
large than groupSize * cellSize), as I said in #1,

*4.*  {{connectFailedOnce}} in {{blockSeekTo}} is not necessary.

*5.*  Why you modify {{SimulatedFSDataset}}?

> Erasure coding: stateful (non-positional) read from files in striped layout
> ---------------------------------------------------------------------------
>
>                 Key: HDFS-8033
>                 URL: https://issues.apache.org/jira/browse/HDFS-8033
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Zhe Zhang
>            Assignee: Zhe Zhang
>         Attachments: HDFS-8033.000.patch, HDFS-8033.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message