hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Ryan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-503) Implement erasure coding as a layer on HDFS
Date Tue, 01 Sep 2009 00:23:32 GMT

    [ https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749701#action_12749701
] 

Andrew Ryan commented on HDFS-503:
----------------------------------

In speaking with Dhruba about this, I had one additional optimization to offer, which he suggested
I add to the issue.

If an HDFS client detects that it is forced to go to the parity block for a file because there
is a missing block, it should proactively perform, or initiate, the equivalent of the "fsckraid"
on the file that it is reading, since going to parity means that something is seriously wrong
(fsck will report 'CORRUPT' for example), and it should not wait for a periodic scan of the
filesystem to occur.

Also, the provided raid.xml in the patch contains only a subset of important configuration
directives, I think it's nice when it includes all possible configuration directives, but
that's just personal preference.

> Implement erasure coding as a layer on HDFS
> -------------------------------------------
>
>                 Key: HDFS-503
>                 URL: https://issues.apache.org/jira/browse/HDFS-503
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: raid1.txt
>
>
> The goal of this JIRA is to discuss how the cost of raw storage for a HDFS file system
can be reduced. Keeping three copies of the same data is very costly, especially when the
size of storage is huge. One idea is to reduce the replication factor and do erasure coding
of a set of blocks so that the over probability of failure of a block remains the same as
before.
> Many forms of error-correcting codes are available, see http://en.wikipedia.org/wiki/Erasure_code.
Also, recent research from CMU has described DiskReduce https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
> My opinion is to discuss implementation strategies that are not part of base HDFS, but
is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message