hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsz Wo Nicholas Sze (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7339) Allocating and persisting block groups in NameNode
Date Fri, 23 Jan 2015 21:11:36 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14289983#comment-14289983

Tsz Wo Nicholas Sze commented on HDFS-7339:

> The main reason for creating a BlockGroup class and the hierarchical block ID protocol
is to minimize NN memory overhead. ...

This can be achieved by using consecutive (normal) block IDs for the blocks in a block group
without dividing the ID space; see below.  (This is not easy to describe it.  Please let me
know if you are confused.)
- For the block groups stored in namenode, only store the first block ID.  The other block
IDs can be deduced with the storage policy.
- Use the same generation stamp for all the blocks.
- How to support lookups in BlocksMap?  There are several ways described below.
-# Change the hash function so that consecutive IDs will be mapped to the same hash value
and implement BlockGroup.equal(..) so that it returns true with any block id in the group.
 For example, we may only use the high 60-bit for computing has code.  Suppose the blocks
in a block group have ID from 0x302 to 0x30A.  We will be able to lookup the block group using
any of the block IDs.  What happen if the first ID is near the low 4-bit boundary, say 0x30D?
 We may simply skip to 0x310 when allocating the block IDs so that it won't happen.
-# We may store the first ID (or the offset to the first ID) also in datanode for ec blocks.
 This seems not a good solution.

If we enforce block id allocation so that the lower 4-bit of the first ID must be zeros, then
it is very similar to the scheme propused in the design doc except there is no notation of
block group in the block IDs.

> Allocating and persisting block groups in NameNode
> --------------------------------------------------
>                 Key: HDFS-7339
>                 URL: https://issues.apache.org/jira/browse/HDFS-7339
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Zhe Zhang
>            Assignee: Zhe Zhang
>         Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, HDFS-7339-003.patch, HDFS-7339-004.patch,
HDFS-7339-005.patch, HDFS-7339-006.patch, Meta-striping.jpg, NN-stripping.jpg
> All erasure codec operations center around the concept of _block group_; they are formed
in initial encoding and looked up in recoveries and conversions. A lightweight class {{BlockGroup}}
is created to record the original and parity blocks in a coding group, as well as a pointer
to the codec schema (pluggable codec schemas will be supported in HDFS-7337). With the striping
layout, the HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. Therefore
we propose to extend a file’s inode to switch between _contiguous_ and _striping_ modes,
with the current mode recorded in a binary flag. An array of BlockGroups (or BlockGroup IDs)
is added, which remains empty for “traditional” HDFS files with contiguous block layout.
> The NameNode creates and maintains {{BlockGroup}} instances through the new {{ECManager}}
component; the attached figure has an illustration of the architecture. As a simple example,
when a {_Striping+EC_} file is created and written to, it will serve requests from the client
to allocate new {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase,
{{BlockGroups}} are allocated both in initial online encoding and in the conversion from replication
to EC. {{ECManager}} also facilitates the lookup of {{BlockGroup}} information for block recovery

This message was sent by Atlassian JIRA

View raw message