hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinayakumar B (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7339) Create block groups for initial block encoding
Date Thu, 06 Nov 2014 08:06:34 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199972#comment-14199972

Vinayakumar B commented on HDFS-7339:

I have merged latest trunk commits to HDFS-EC

> Create block groups for initial block encoding
> ----------------------------------------------
>                 Key: HDFS-7339
>                 URL: https://issues.apache.org/jira/browse/HDFS-7339
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Zhe Zhang
>            Assignee: Zhe Zhang
>         Attachments: Encoding-design-NN.jpg, HDFS-7339-001.patch
> All erasure codec operations center around the concept of _block groups_, which are formed
in encoding and looked up in decoding. This JIRA creates a lightweight {{BlockGroup}} class
to record the original and parity blocks in an encoding group, as well as a pointer to the
codec schema. Pluggable codec schemas will be supported in HDFS-7337. 
> The NameNode creates and maintains {{BlockGroup}} instances through 2 new components;
the attached figure has an illustration of the architecture.
> {{ECManager}}: This module manages {{BlockGroups}} and associated codec schemas. As a
simple example, it stores the codec schema of Reed-Solomon algorithm with 3 original and 2
parity blocks (5 blocks in each group). Each {{BlockGroup}} points to the schema it uses.
To facilitate lookups during recovery requests, {{BlockGroups}} should be oraganized as a
map keyed by {{Blocks}}.
> {{ErasureCodingBlocks}}: Block encoding work is triggered by multiple events. This module
analyzes the incoming events, and dispatches tasks to {{UnderReplicatedBlocks}} to create
parity blocks. A new queue ({{QUEUE_INITIAL_ENCODING}}) will be added to the 5 existing priority
queues to maintain the relative order of encoding and replication tasks.
> * Whenever a block is finalized and meets EC criteria -- including 1) block size is full;
2) the file’s storage policy allows EC -- {{ErasureCodingBlocks}} tries to form a {{BlockGroup}}.
In order to do so it needs to store a set of blocks waiting to be encoded. Different grouping
algorithms can be applied -- e.g., always grouping blocks in the same file. Blocks in a group
should also reside on different DataNodes, and ideally on different racks, to tolerate node
and rack failures. If successful, it records the formed group with {{ECManager}} and insert
the parity blocks into {{QUEUE_INITIAL_ENCODING}}.
> * When a parity block or a raw block in {{ENCODED}} state is found missing, {{ErasureCodingBlocks}}
adds it to existing priority queues in {{UnderReplicatedBlocks}}. E.g., if all parity blocks
in a group are lost, they should be added to {{QUEUE_HIGHEST_PRIORITY}}. New priorities might
be added for fine grained differentiation (e.g., loss of a raw block versus a parity one).

This message was sent by Atlassian JIRA

View raw message