hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7337) Configurable and pluggable Erasure Codec and schema
Date Fri, 17 Jul 2015 22:10:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632005#comment-14632005

Andrew Wang commented on HDFS-7337:

Hi [~drankye], this JIRA resurfaced due to discussion on HDFS-8059 and also the need to persist
this information in the fsimage/editlog, I was hoping we could clarify the configuration language
for an EC codec and schema.

I read through the v3 design doc, and please let me know if you think the following would

Schema (stored on EC zone or INode as a PB so we can evolve it):
* Codec enum (e.g. RS, LRC, etc), which would also have a friendly human-readable name. The
enum is good for efficiency and so the user can only pick from supported codecs.
* List of k,v pairs for configuration. This could be used for k, m, and any other arbitrary
parameters needed by the codec. Very general.

* Client would validate the codec of a file against the codecs supported in its own software
version. This way, if we add a new codec type, we can restrict old clients from reading it.
* In client's hdfs-site.xml, we can configure a codec implementation for every codec. This
would look something like e.g. {{dfs.client.ec.codec.reed-solomon.impl = org.apache.hadoop....isal}},
saying to use ISA-L for reed-solomon.

This is just to get us going for phase 1. We'd be restricting users to choosing from a list
of known-good codecs, while they could still provide their own codec implementations as long
as they implement the interfaces.

When we get to the point of fully-pluggable codecs, we can add a special "wildcard" enum value
to support this, and then potentially add new fields to the PB if required. This will require
another HDFS upgrade before we can support full pluggability, but it sounds like we still
need to figure out interfaces for things like block placement and recovery logic anyway.

> Configurable and pluggable Erasure Codec and schema
> ---------------------------------------------------
>                 Key: HDFS-7337
>                 URL: https://issues.apache.org/jira/browse/HDFS-7337
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Zhe Zhang
>            Assignee: Kai Zheng
>         Attachments: HDFS-7337-prototype-v1.patch, HDFS-7337-prototype-v2.zip, HDFS-7337-prototype-v3.zip,
PluggableErasureCodec-v2.pdf, PluggableErasureCodec-v3.pdf, PluggableErasureCodec.pdf
> According to HDFS-7285 and the design, this considers to support multiple Erasure Codecs
via pluggable approach. It allows to define and configure multiple codec schemas with different
coding algorithms and parameters. The resultant codec schemas can be utilized and specified
via command tool for different file folders. While design and implement such pluggable framework,
it’s also to implement a concrete codec by default (Reed Solomon) to prove the framework
is useful and workable. Separate JIRA could be opened for the RS codec implementation.
> Note HDFS-7353 will focus on the very low level codec API and implementation to make
concrete vendor libraries transparent to the upper layer. This JIRA focuses on high level
stuffs that interact with configuration, schema and etc.

This message was sent by Atlassian JIRA

View raw message