hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kai Zheng (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7715) Implement the Hitchhiker erasure coding algorithm
Date Wed, 25 Mar 2015 14:37:53 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14379957#comment-14379957

Kai Zheng commented on HDFS-7715:

Hi Jack,
bq.I just modify it for building in my machine. sorry for that.
No sorry at all. I understand.
bq.I have refined my codes with your suggestions, coding style using google style.
I'm a little confused. I thought we should use Hadoop coding style. If you're not clear about
that, please read:
bq.I think we should balance gains between using native RS raw coders and separate them.
I'm glad you confirmed it should be doable for you to reuse existing raw coders for XOR and
RS codes. I thought the performance consideration should be OK for now, particularly considering
we will have native implementations of the raw coders. Anyway, reusing existing raw coders
would save us much of low level codes and allow us to focus on the HH specific algorithms.
Your current effort still makes sense for your grasping of the codes. Regarding your performance
improvement, what's that ? Maybe you can apply it to the Java implementation of RS raw coder
? I will come up benchmark tool to compare performance for raw coders in HADOOP-11588. Hope
it will help. 

In your current implementation, you have hard-coded matrix and parameters (10, 4). I'm wondering
if it could get resolved as we desire a code can be configurable and flexible. Will it help
if we reuse existing raw coders ?

I saw you uploaded new patches for both encoder and decoder. Please merge them together to
make a complete patch. We can have more review.

> Implement the Hitchhiker erasure coding algorithm
> -------------------------------------------------
>                 Key: HDFS-7715
>                 URL: https://issues.apache.org/jira/browse/HDFS-7715
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Zhe Zhang
>            Assignee: jack liuquan
>         Attachments: HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch
> [Hitchhiker | http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf]
is a new erasure coding algorithm developed as a research project at UC Berkeley. It has been
shown to reduce network traffic and disk I/O by 25%-45% during data reconstruction. This JIRA
aims to introduce Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 

This message was sent by Atlassian JIRA

View raw message