hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kai Zheng (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7337) Configurable and pluggable Erasure Codec and schema
Date Tue, 21 Mar 2017 02:30:42 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933994#comment-15933994
] 

Kai Zheng commented on HDFS-7337:
---------------------------------

[~andrew.wang], [~zhz], [~rakeshr] or anybody

Trying not to be complicated, based on the existing codes we already have, the goal here seems
to be easier to target now.

In {{ErasureCodingPolicyManager}} we have these built-in EC policies:
{code}
  private static final int DEFAULT_CELLSIZE = 64 * 1024;
  private static final ErasureCodingPolicy SYS_POLICY1 =
      new ErasureCodingPolicy(ErasureCodeConstants.RS_6_3_SCHEMA,
          DEFAULT_CELLSIZE, HdfsConstants.RS_6_3_POLICY_ID);
  private static final ErasureCodingPolicy SYS_POLICY2 =
      new ErasureCodingPolicy(ErasureCodeConstants.RS_3_2_SCHEMA,
          DEFAULT_CELLSIZE, HdfsConstants.RS_3_2_POLICY_ID);
  private static final ErasureCodingPolicy SYS_POLICY3 =
      new ErasureCodingPolicy(ErasureCodeConstants.RS_6_3_LEGACY_SCHEMA,
          DEFAULT_CELLSIZE, HdfsConstants.RS_6_3_LEGACY_POLICY_ID);
  private static final ErasureCodingPolicy SYS_POLICY4 =
      new ErasureCodingPolicy(ErasureCodeConstants.XOR_2_1_SCHEMA,
          DEFAULT_CELLSIZE, HdfsConstants.XOR_2_1_POLICY_ID);
  private static final ErasureCodingPolicy SYS_POLICY5 =
      new ErasureCodingPolicy(ErasureCodeConstants.RS_10_4_SCHEMA,
          DEFAULT_CELLSIZE, HdfsConstants.RS_10_4_POLICY_ID);
{code}

In {{ErasureCodeConstants}} we have these schemas used by the above policies:
{code}
  public static final String RS_CODEC_NAME = "rs";
  public static final String RS_LEGACY_CODEC_NAME = "rs-legacy";
  public static final String XOR_CODEC_NAME = "xor";
  public static final String HHXOR_CODEC_NAME = "hhxor";

  public static final ECSchema RS_6_3_SCHEMA = new ECSchema(
      RS_CODEC_NAME, 6, 3);

  public static final ECSchema RS_3_2_SCHEMA = new ECSchema(
      RS_CODEC_NAME, 3, 2);

  public static final ECSchema RS_6_3_LEGACY_SCHEMA = new ECSchema(
      RS_LEGACY_CODEC_NAME, 6, 3);

  public static final ECSchema XOR_2_1_SCHEMA = new ECSchema(
      XOR_CODEC_NAME, 2, 1);

  public static final ECSchema RS_10_4_SCHEMA = new ECSchema(
      RS_CODEC_NAME, 10, 4);
{code}

In HDFS-11314 it allows to enforce set of enabled EC policies on the NameNode like follow:
{code}
 <property>
  <name>dfs.namenode.ec.policies.enabled</name>
  <value>RS-6-3-64k, RS-10-4-128k</value>
  <description>Comma-delimited list of enabled erasure coding policies.
    The NameNode will enforce this when setting an erasure coding policy
    on a directory.
  </description>
</property>
{code}

For a codec the used raw coder impl can be configured as follows, using the {{rs}} codec as
an example:
{code}
<property>
  <name>io.erasurecode.codec.rs.rawcoder</name>
  <value>org.apache.hadoop.io.erasurecode.rawcoder.RSRawErasureCoderFactory</value>
  <description>
    Raw coder implementation for the rs codec. The default value is a
    pure Java implementation. There is also a native implementation. Its value
    is org.apache.hadoop.io.erasurecode.rawcoder.NativeRSRawErasureCoderFactory.
  </description>
</property>
{code}

So given above, what would be lacked and needed now could be, a mechanism (say writing an
XML file) to let admin users define their EC schema and policies in NameNode side. The reasons
to do this: 
* Users want to try different codec;
* Users want to use different codec parameters, for RS codec, say 10 + 4 other than 6 + 3;
* Users want to try different cell size other than 64k.

Yes it's nice to have. I heard there are somebody wanting to try different things other than
the built-in ones available in the codes. If it sounds not so high weight, we can work on
and make it in the release cycle.

Comments?


> Configurable and pluggable Erasure Codec and schema
> ---------------------------------------------------
>
>                 Key: HDFS-7337
>                 URL: https://issues.apache.org/jira/browse/HDFS-7337
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: erasure-coding
>            Reporter: Zhe Zhang
>            Assignee: Kai Zheng
>              Labels: hdfs-ec-3.0-nice-to-have
>         Attachments: HDFS-7337-prototype-v1.patch, HDFS-7337-prototype-v2.zip, HDFS-7337-prototype-v3.zip,
PluggableErasureCodec.pdf, PluggableErasureCodec-v2.pdf, PluggableErasureCodec-v3.pdf
>
>
> According to HDFS-7285 and the design, this considers to support multiple Erasure Codecs
via pluggable approach. It allows to define and configure multiple codec schemas with different
coding algorithms and parameters. The resultant codec schemas can be utilized and specified
via command tool for different file folders. While design and implement such pluggable framework,
it’s also to implement a concrete codec by default (Reed Solomon) to prove the framework
is useful and workable. Separate JIRA could be opened for the RS codec implementation.
> Note HDFS-7353 will focus on the very low level codec API and implementation to make
concrete vendor libraries transparent to the upper layer. This JIRA focuses on high level
stuffs that interact with configuration, schema and etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message