hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kai Zheng (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
Date Mon, 11 May 2015 10:05:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537790#comment-14537790

Kai Zheng commented on HADOOP-11828:

Hi Jack, thanks for your clarifying.

For the first 3 points I really would like them to be resolved first as they're clear to us
now and it would lay a more solid base for the following implementations of the other two
modes. Doing so we won't have to change big after committed. I understand the process isn't
very productive but that's the pain of open source. I really wish we could get this in sooner
but we have to do more reviews from more guys, so I guess you will have chances to get the
codes more clean and elegant. 
bq. HH is specific in preparing input data in decoding
I don't think so, any erasure code is used to encode and decode arbitrary user data, we don't
need to prepare for it specifically. 
bq. Current testCoding()in TestErasureCoderBase using left 9 data units + 4 parity units to
reconstruct the missing one data unit. 
Yes it is for now. It will be corrected in HADOOP-11847. I thought it's good to customize
the {{testCoding}} logic here, but in future we should consolidate the codes into the parent
bq. I have no good idea cause encoding of RS will erasure input data. 
I see. I don't have either, checking the RS codes it's not easy to avoid the erasure. Let's
optimize it in future when we get all the things work right first.

> Implement the Hitchhiker erasure coding algorithm
> -------------------------------------------------
>                 Key: HADOOP-11828
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11828
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Zhe Zhang
>            Assignee: jack liuquan
>         Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 7715-hitchhikerXOR-v2.patch,
HADOOP-11828-hitchhikerXOR-V3.patch, HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch,
> [Hitchhiker | http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf]
is a new erasure coding algorithm developed as a research project at UC Berkeley. It has been
shown to reduce network traffic and disk I/O by 25%-45% during data reconstruction. This JIRA
aims to introduce Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 

This message was sent by Atlassian JIRA

View raw message