hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhe Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7337) Configurable and pluggable Erasure Codec and schema
Date Thu, 12 Mar 2015 23:34:38 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359619#comment-14359619
] 

Zhe Zhang commented on HDFS-7337:
---------------------------------

Thanks Kai for the update! The design looks good to me overall.

I also took the chance to look at {{ErasureCodec}} and {{ECSchema}} again. IIUC, {{ErasureCodec}}
is like a factory or an utility class, which creates {{ErasureCoder}} and {{BlockGrouper}}
based on {{ECSchema}}. 

If that's the case, I think we can leverage the pattern of {{BlockStoragePolicySuite}}. Something
like:
{code}
public static ECSchemaSuite createDefaultSuite() {
    final ECSchema[] schemas =
        new ECSchema[2];
    final byte RS63 = HdfsConstants.RS63_EC_SCHEMA_ID;
    policies[RS63] = new ECSchema(RS63,
        HdfsConstants.RS63_EC_SCHEMA_NAME,
        HdfsConstants.RS_EC_ALGORITHM_ID,
        6, 3, chunkSize);
    final byte XOR21 = HdfsConstants.XOR21_EC_SCHEMA_ID;
    policies[XOR21] = new ECSchema(XOR21,
        HdfsConstants.XOR21_EC_SCHEMA_NAME,
        HdfsConstants.XOR_EC_ALGORITHM_ID,
        2, 1, chunkSize);
  }
{code}

Then NN can just pass around the schema ID when communicating with DN and client, which is
much smaller than an {{ErasureCodec}} object.

> Configurable and pluggable Erasure Codec and schema
> ---------------------------------------------------
>
>                 Key: HDFS-7337
>                 URL: https://issues.apache.org/jira/browse/HDFS-7337
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Zhe Zhang
>            Assignee: Kai Zheng
>         Attachments: HDFS-7337-prototype-v1.patch, HDFS-7337-prototype-v2.zip, HDFS-7337-prototype-v3.zip,
PluggableErasureCodec-v2.pdf, PluggableErasureCodec.pdf
>
>
> According to HDFS-7285 and the design, this considers to support multiple Erasure Codecs
via pluggable approach. It allows to define and configure multiple codec schemas with different
coding algorithms and parameters. The resultant codec schemas can be utilized and specified
via command tool for different file folders. While design and implement such pluggable framework,
it’s also to implement a concrete codec by default (Reed Solomon) to prove the framework
is useful and workable. Separate JIRA could be opened for the RS codec implementation.
> Note HDFS-7353 will focus on the very low level codec API and implementation to make
concrete vendor libraries transparent to the upper layer. This JIRA focuses on high level
stuffs that interact with configuration, schema and etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message