hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kai Zheng (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7345) Local Reconstruction Codes (LRC)
Date Tue, 11 Nov 2014 05:23:34 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14205962#comment-14205962

Kai Zheng commented on HDFS-7345:

>From [Facebook’s advanced erasure codes|http://storagemojo.com/2013/06/21/facebooks-advanced-erasure-codes/]:
LRC test results found several key results.
* Disk I/O and network traffic were reduced by half compared to RS codes.
* The LRC required 14% more storage than RS, information theoretically optimal for the obtained
* Repairs times were much lower thanks to the local repair codes.
* Much greater reliability thanks to fast repairs.
* Reduced network traffic makes them suitable for geographic distribution.

So looks like LRC is quite apprealing to HDFS. I'm wondering if there is any IP concern if
we do. The concern is there because LRC is from MS research and I haven't got any confirm
yet it's available to the community.

Could anyone help confirm this, about the LRC IP concern?

> Local Reconstruction Codes (LRC)
> --------------------------------
>                 Key: HDFS-7345
>                 URL: https://issues.apache.org/jira/browse/HDFS-7345
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Kai Zheng
>            Assignee: Kai Zheng
> HDFS-7285 proposes to support Erasure Coding inside HDFS, supports multiple Erasure Coding
codecs via pluggable framework and implements Reed Solomon code by default. This is to support
a more advanced coding mechanism, Local Reconstruction Codes (LRC). As discussed in the paper
(https://www.usenix.org/system/files/conference/atc12/atc12-final181_0.pdf), LRC reduces the
number of erasure coding fragments that need to be read when reconstructing data fragments
that are offline, while still keeping the storage overhead low. The important benefits of
LRC are that it reduces the bandwidth and I/Os required for repair reads over prior codes,
while still allowing a significant reduction in storage overhead. Intel ISA library also supports
LRC in its update and can also be leveraged. The implementation would also consider how to
distribute the calculating of local and global parity blocks to other relevant DataNodes.

This message was sent by Atlassian JIRA

View raw message