hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7337) Configurable and pluggable Erasure Codec and schema
Date Tue, 06 Jan 2015 21:35:35 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266816#comment-14266816
] 

Andrew Wang commented on HDFS-7337:
-----------------------------------

Hey Kai, thanks for getting us started here. I gave this a quick look, had a few comments:

* Could you generate normal plaintext diffs rather than a zip? We might also want to reorganize
things into existing packages. The rawcoder stuff could go somewhere in hadoop-common for
instance. We could move the block grouper classes into blockmanagement. etc.
* I see mixed tabs and spaces, we do spaces only in Hadoop.
* Since the LRC stuff is still up in the air, could we defer everything related to that to
a later JIRA?
* In RSBlockGrouper, using ExtendedBlockId is overkill, since the bpid is the same for everything

Configuration
* The XML file approach seems potentially error-prone. IIUC after a set of parameters are
assigned to a schema name, the parameters should never be changed. We thus also need to keep
the xml file in sync between the NN, DN, and client. The client part is especially troublesome.
Are we planning to put into the editlog/image down the road, like how we do storage policies?
* Also, I think we want to separate out the the type of erasure coding from the implementation.
The schema definition from the PDF encodes both together, e.g. JerasureRS. While it's not
possible to change the RS part, the user might want to swap out Jerasure for ISAL which should
be allowed. This is sort of like how we did things for encryption; we define a CipherSuite
(i.e. AES-CTR) and then the user can choose among the multiple pluggable implementations for
that cipher.

BlockGroup:
* Zhe told me this is a placeholder class, but a few comments nonetheless.
* Can we just set the two fields in the constructor? They should also be final.
* Since the schema encodes the layout, does SubBlockGroup need to encode both data and parity?
Do we even need SubBlockGroup? Seems like a single array and a schema (a concrete object,
which also encodes the RS or LRC parameters) tells you the layout, which is sufficient. This
will save some memory.

> Configurable and pluggable Erasure Codec and schema
> ---------------------------------------------------
>
>                 Key: HDFS-7337
>                 URL: https://issues.apache.org/jira/browse/HDFS-7337
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Zhe Zhang
>            Assignee: Kai Zheng
>         Attachments: HDFS-7337-prototype-v1.patch, HDFS-7337-prototype-v2.zip, HDFS-7337-prototype-v3.zip,
PluggableErasureCodec.pdf
>
>
> According to HDFS-7285 and the design, this considers to support multiple Erasure Codecs
via pluggable approach. It allows to define and configure multiple codec schemas with different
coding algorithms and parameters. The resultant codec schemas can be utilized and specified
via command tool for different file folders. While design and implement such pluggable framework,
it’s also to implement a concrete codec by default (Reed Solomon) to prove the framework
is useful and workable. Separate JIRA could be opened for the RS codec implementation.
> Note HDFS-7353 will focus on the very low level codec API and implementation to make
concrete vendor libraries transparent to the upper layer. This JIRA focuses on high level
stuffs that interact with configuration, schema and etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message