hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kai Zheng (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
Date Mon, 04 May 2015 14:01:07 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526652#comment-14526652
] 

Kai Zheng commented on HADOOP-11828:
------------------------------------

Hi Jack, the updated patch looks overall good to me. Some comments so far:
* Some comments might be better to reorganized to make them look better. Some are too long,
and some can be longer.
* Please note lines should not exceed 80 chars. You could set the width limit in your IDE.
* As both xor raw coder and rs raw coder are common to erasure coders for RS and HH, please
extract common codes resolving the duplicates to abstract class, regarding creating xor and
rs raw coder.
* We may need abstract class like {{HHErasureDecodingStep}} and {{HHErasureEncodingStep}}
for the three derivations of the HH algorithm. Classes like {{HHXORErasureDecodingStep}} can
inherit from them.
* Please try to reuse codes between the two versions of coding: byte[] version and ByteBuffer
version. You may look at the patch in HADOOP-11847 for some idea.
* We might not override {{testCoding}} and {{performCodingStep}} in {{TestHHErasureCoderBase}}.
Any specific for HH here? If we have to, then there would be problem to use the coder as it's
not general to use.
* We need Javadocs for the public functions in {{HHUtil}}.
* Is it possible to avoid the cloning input data in {{getPiggyBacksFromInput}}?
* I thought we don't need this test as it's the configuration isn't specific to the coder.
{code}
+  @Test
+  public void testCodingDirectBufferWithConf_10x4() {
+    /**
+     * This tests if the two configuration items work or not.
+     */
+    Configuration conf = new Configuration();
+    conf.set(CommonConfigurationKeys.IO_ERASURECODE_CODEC_RS_RAWCODER_KEY,
+        RSRawErasureCoderFactory.class.getCanonicalName());
+    conf.setBoolean(
+        CommonConfigurationKeys.IO_ERASURECODE_CODEC_RS_USEXOR_KEY, false);
+    prepare(conf, 10, 4, null);
+    initHitchhiker();
+    testCoding(true);
+  }
{code}

> Implement the Hitchhiker erasure coding algorithm
> -------------------------------------------------
>
>                 Key: HADOOP-11828
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11828
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Zhe Zhang
>            Assignee: jack liuquan
>         Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 7715-hitchhikerXOR-v2.patch,
HADOOP-11828-hitchhikerXOR-V3.patch, HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch,
HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf]
is a new erasure coding algorithm developed as a research project at UC Berkeley. It has been
shown to reduce network traffic and disk I/O by 25%-45% during data reconstruction. This JIRA
aims to introduce Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message