hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kai Zheng (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7715) Implement the Hitchhiker erasure coding algorithm
Date Fri, 03 Apr 2015 16:36:54 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14394657#comment-14394657

Kai Zheng commented on HDFS-7715:

Hi [~rashmikv],

I thought you made good point, we should make our interface general enough in order to be
able to cover more codes other than just HH code. I guess your suggestion is based on HDFS-RAID.
In our new codec & coder framework, how to read chunk from a block is up to the coder
caller since codec & coder don't want to involve concrete environment like HDFS specifics.
{{ECBlock}} is an abstraction which is subject to be extended and customized by HDFS block
stuff. The coder caller by default will simply read chunk by chunk from a block for now. For
HH and also other possible codes, it may have different tweak regarding how to read chunk(s)
from block for a coding procedure, that's why I suggested adding {{readChunk}} method in {{ECBlock}}
class so {{HHECBlock}} can then be able to customize the behavior. The mentioned offset and
len would be kept in HHECBlock internally, and will be then used to call the real read method
for a real HDFS block. So in the underlying we will have some method to use the two parameters
for the real read, though it's not in the interface level. I will refine related codes in
this way in a patch to illustrate this idea. Maybe you could look at it then for what I'm
really meaning here.

> Implement the Hitchhiker erasure coding algorithm
> -------------------------------------------------
>                 Key: HDFS-7715
>                 URL: https://issues.apache.org/jira/browse/HDFS-7715
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Zhe Zhang
>            Assignee: jack liuquan
>         Attachments: 7715-hitchhikerXOR-v2.patch, HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch
> [Hitchhiker | http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf]
is a new erasure coding algorithm developed as a research project at UC Berkeley. It has been
shown to reduce network traffic and disk I/O by 25%-45% during data reconstruction. This JIRA
aims to introduce Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 

This message was sent by Atlassian JIRA

View raw message