hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhe Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7354) Support parity blocks in block management
Date Fri, 23 Jan 2015 01:29:35 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288556#comment-14288556

Zhe Zhang commented on HDFS-7354:

[~szetszwo] Good point, under the striping layout, we don't group multiple files in the same
block group. So each parity block can indeed be treated as part of the file.

Parity blocks still need special treatment though, because they should be recovered with lower
priority than data blocks. I'll update the JIRA description.

This JIRA was actually created based on the [old design | https://issues.apache.org/jira/secure/attachment/12677810/HDFSErasureCodingDesign-20141028.pdf]
with non-striping / contiguous layout, where parity blocks could be orphans. We chose not
to implement it in the initial phase because it complicates file deletions. But I don't think
it's completely off the table. We might revisit it in the future to work with applications
which have strict locality requirements. 

> Support parity blocks in block management
> -----------------------------------------
>                 Key: HDFS-7354
>                 URL: https://issues.apache.org/jira/browse/HDFS-7354
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Zhe Zhang
>            Assignee: Zhe Zhang
> In the current block management system, each block is associated with a file. "Orphan"
blocks are considered corrupt and will be removed.
> In this JIRA we extend {{Block}} with a binary flag denoting whether it is a parity block
({{isParity}}). Parity blocks are created, stored, and reported the same way as raw ones.
They have regular block IDs which are unrelated to those of the raw blocks in the same group;
their replicas (normally only 1) are stored in RBW and finalized directories on the DataNode
depending on the stage; they are also included in block reports. The only distinction of a
parity block is the lack of file affiliation. The block management system will be aware of
parity blocks and will _not_ try to remove them.

This message was sent by Atlassian JIRA

View raw message