hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhe Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7337) Configurable and pluggable Erasure Codec and schema
Date Mon, 29 Dec 2014 21:02:13 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14260441#comment-14260441
] 

Zhe Zhang commented on HDFS-7337:
---------------------------------

Great work [~drankye] ! I went over the design and have the following comments:
# I like the idea of creating an {{ec}} package under {{org.apache.hadoop.hdfs}}. It is a
good place to host all codec classes.
# I think the {{ec}} package should focus on codec calculation based on a packet unit. Below
is how I think the functions should be logically divided:
#* The {{ErasureCodec}} interface simply provide encode and decode functions that take a {{byte[][]}}
and produce another {{byte[][]}}. It should be *unaware* of blocks. For example, I imagine
our encode function should look similar to Jerasure's (https://github.com/tsuraan/Jerasure/blob/master/Manual.pdf):

{code} void jerasure matrix encode(k, m, w, matrix, data_ptrs, coding_ptrs, size) {code}
#* {{BlockGroups}} should be formed by {{ECManager}}. In doing so it calls the encode and
decode functions from {{ErasureCodec}}
# Logically, {{BlockGroup}} is applicable even without EC, because striping can be done without
EC. So an alternative is to put it in the {{protocol}} package.
# I don't think we should reference the schema through a name (since it wastes space and is
fragile). We should look at other configurable policies (e.g., block placement algorithm)
and see how they are loaded. IIRC a factory class is used.
# It's great that we are considering LRC in advance. However, with LEGAL-211 pending, I suggest
we keep {{BlockGroup}} simpler for now. For example, it can contain only {{dataBlocks}} and
{{parityBlocks}}. When we implement LRC we can subclass or extend it.
# I guess {{ECBlock}} is for testing purpose? An erasure coded block should have all properties
of a regular block. I think we can just add a couple of flags to the {{Block}} class.
# It's not quite clear to me why we need {{ErasureCoderCallback}}. Is it for async codec calculation?
If codec calculations are done on small packets, I think sync operations are fine.

Thanks!

> Configurable and pluggable Erasure Codec and schema
> ---------------------------------------------------
>
>                 Key: HDFS-7337
>                 URL: https://issues.apache.org/jira/browse/HDFS-7337
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Zhe Zhang
>            Assignee: Kai Zheng
>         Attachments: HDFS-7337-prototype-v1.patch, HDFS-7337-prototype-v2.zip, HDFS-7337-prototype-v3.zip,
PluggableErasureCodec.pdf
>
>
> According to HDFS-7285 and the design, this considers to support multiple Erasure Codecs
via pluggable approach. It allows to define and configure multiple codec schemas with different
coding algorithms and parameters. The resultant codec schemas can be utilized and specified
via command tool for different file folders. While design and implement such pluggable framework,
it’s also to implement a concrete codec by default (Reed Solomon) to prove the framework
is useful and workable. Separate JIRA could be opened for the RS codec implementation.
> Note HDFS-7353 will focus on the very low level codec API and implementation to make
concrete vendor libraries transparent to the upper layer. This JIRA focuses on high level
stuffs that interact with configuration, schema and etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message