hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kai Zheng (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7337) Configurable and pluggable Erasure Codec and schema
Date Fri, 09 Jan 2015 22:46:37 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14272010#comment-14272010

Kai Zheng commented on HDFS-7337:

Still continued to address Zhe's above comments.
bq. I guess ECBlock is for testing purpose? An erasure coded block should have all properties
of a regular block. I think we can just add a couple of flags to the Block class.
Somehow you're right the ECBlock class isn't finalized as the whole bundle of codes were attached
for our looking at and discussing. I'm working on this and will decouple ECBlock from HDFS
block. It's possible because the codec framework has already nice arrangement to delegate
how to pull/extract chunks (ECChunk) from ECBlock. It's the caller's (ECWorker or ECClient)
responsibility to handle how to extract/collect bytes chunks from an actual HDFS block. When
decoupled, the ECBlock or similar would be very lightweight and won't need so many fields
at all. I will have new codes for us discussion further.
bq. It's not quite clear to me why we need ErasureCoderCallback. Is it for async codec calculation?
If codec calculations are done on small packets, I think sync operations are fine.
The ErasureCoderCallback maybe better named to avoid such confusion. It's not relevant to
sync or async. It's basically for the codec caller (ECWorker or ECClient) to handle how to
get chunks from blocks. Codec will call it to pull chunks from blocks. It can be regarded
as data sources provider. In ECWorker in transforming case, many chunks can be pulled from
the transformed blocks, thus the enclosed bytes level encode() or decode() in raw coder can
be called in many places in a while loop. In ECClient in stripping ec case, it's similar until
the application finishes to write/read data from a BlockGroup.

> Configurable and pluggable Erasure Codec and schema
> ---------------------------------------------------
>                 Key: HDFS-7337
>                 URL: https://issues.apache.org/jira/browse/HDFS-7337
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Zhe Zhang
>            Assignee: Kai Zheng
>         Attachments: HDFS-7337-prototype-v1.patch, HDFS-7337-prototype-v2.zip, HDFS-7337-prototype-v3.zip,
> According to HDFS-7285 and the design, this considers to support multiple Erasure Codecs
via pluggable approach. It allows to define and configure multiple codec schemas with different
coding algorithms and parameters. The resultant codec schemas can be utilized and specified
via command tool for different file folders. While design and implement such pluggable framework,
it’s also to implement a concrete codec by default (Reed Solomon) to prove the framework
is useful and workable. Separate JIRA could be opened for the RS codec implementation.
> Note HDFS-7353 will focus on the very low level codec API and implementation to make
concrete vendor libraries transparent to the upper layer. This JIRA focuses on high level
stuffs that interact with configuration, schema and etc.

This message was sent by Atlassian JIRA

View raw message