hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kai Zheng (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7337) Configurable and pluggable Erasure Codec and schema
Date Fri, 13 Mar 2015 06:37:39 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360018#comment-14360018
] 

Kai Zheng commented on HDFS-7337:
---------------------------------

Thanks [~zhz] for the detailed clarifying. It's great we're much aligned !
bq.Is there another JIRA handling the management of loaded schemas? If not maybe we can consider
ECSchemaSuite?
I got your point about {{ECSchemaSuite}}. HDFS-7866 was the JIRA that does the job you're
saying. I do have some rough codes for it where {{ECSchemaManager}} is the core part.  {{ECSchemaManager}}
basically wraps a map and contains all the ACTIVE schemas synced between NN metadata and predefined.
At one side it serves for dfsadmin to request reloading predefined schemas (by HADOOP-11664)
in authorization controlled way, at the other side it also serves for client requests for
schemas list and detailed definition. I will rethink about the codes and see {{ECSchemaSuite}}
works the better, or borrow the benefits. By the way HDFS-7859 is used to persist schemas
in NN metadata. Anything more for us to fill the gap ?
bq.Actually even if hard-code all schemas it's still dangerous to pass only the schema ID
I agree. Thanks for the confirmation. With schema object passed around and available in DN
and client, we can perform schema-driven encoding and decoding, which will be much safer and
flexible.

Currently I'm working from bottom up and hopefully it wouldn't be too long to achieve to NN
and get all the work hooked together.

> Configurable and pluggable Erasure Codec and schema
> ---------------------------------------------------
>
>                 Key: HDFS-7337
>                 URL: https://issues.apache.org/jira/browse/HDFS-7337
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Zhe Zhang
>            Assignee: Kai Zheng
>         Attachments: HDFS-7337-prototype-v1.patch, HDFS-7337-prototype-v2.zip, HDFS-7337-prototype-v3.zip,
PluggableErasureCodec-v2.pdf, PluggableErasureCodec.pdf
>
>
> According to HDFS-7285 and the design, this considers to support multiple Erasure Codecs
via pluggable approach. It allows to define and configure multiple codec schemas with different
coding algorithms and parameters. The resultant codec schemas can be utilized and specified
via command tool for different file folders. While design and implement such pluggable framework,
it’s also to implement a concrete codec by default (Reed Solomon) to prove the framework
is useful and workable. Separate JIRA could be opened for the RS codec implementation.
> Note HDFS-7353 will focus on the very low level codec API and implementation to make
concrete vendor libraries transparent to the upper layer. This JIRA focuses on high level
stuffs that interact with configuration, schema and etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message