hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yi Liu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-7348) Erasure Coding: striped block recovery
Date Sun, 03 May 2015 15:19:06 GMT

     [ https://issues.apache.org/jira/browse/HDFS-7348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yi Liu updated HDFS-7348:
-------------------------
    Attachment: HDFS-7348.002.patch

Thanks Zhe for the good comment.

I update the patch according to our discussion and address the comments. Main changes to the
patch:
*1.* The buffer size is configurable now, and default size is 256KB, same as default cell
size.
*2.* Add encode and decode logic for recovery. If all missed blocks are parity blocks, then
we need to do encode, there is an improvement, I filed HADOOP-11908.  If one of missed blocks
is data block, we need to do decode, currently I found decode only works for data blocks and
we also need to prepare full inputs as Zhe said. So the decode logic in the patch is a workaround
and only works for parityBlkNum number of data blocks missed. We can update it after HADOOP-11847.
*3.* Enhance test cases. And they success in my local env.

Zhe, following is reply to some of your comments and I address your other comments in the
patch:
{quote}
Why do we need targetInputStreams?
{quote}
My original design is to do packet ack check, we can do it in phase 2, so I remove it from
the current patch.

{quote}
The test failed on my local machine, reporting NPE when closing file
{quote}
I found it's a bug of existing code, I filed HDFS-8313 for it. The exception occurs accidentally.

{quote}
cluster#stopDataNode might be an easier way to kill a DN?
{quote}
{{stopDataNode}} can only shutdown the DN, and NN needs to wait for long time to mark the
datanode as dead.  So as I said in the test comment, we need to clear its update time and
trigger NN to check heartbeat, then NN will mark the datanode as dead immediately, and then
can schedule striped block recovery.

{quote}
Should WRITE_PACKET_SIZE be linked to BlockSender#MIN_BUFFER_WITH_TRANSFERTO
{quote}
{{BlockSender#MIN_BUFFER_WITH_TRANSFERTO}} is for transfer of continuous block replication,
it's a little different (transfer the file directly), I don't want to connect it with that,
I think it's fine we define the value directly.

{quote}
Follow on: we should consider consolidating the init thread pool logic for hedged read, client
striped read, and DN striped read.
{quote}
yes, we can do it in follow-on.

> Erasure Coding: striped block recovery
> --------------------------------------
>
>                 Key: HDFS-7348
>                 URL: https://issues.apache.org/jira/browse/HDFS-7348
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode
>            Reporter: Kai Zheng
>            Assignee: Yi Liu
>         Attachments: ECWorker.java, HDFS-7348.001.patch, HDFS-7348.002.patch
>
>
> This JIRA is to recover one or more missed striped block in the striped block group.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message