hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Haohui Mai (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7437) Storing block ids instead of BlockInfo object in INodeFile
Date Tue, 02 Dec 2014 00:14:12 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230738#comment-14230738
] 

Haohui Mai commented on HDFS-7437:
----------------------------------

In the current implementation, there is implicit dependency between {{INodeFile}} and the
block management layer. An {{INodeFile}} instance contains a list of {{BlockInfo}} objects
which identifies the blocks that the file contains. These {{BlockInfo}} objects also contains
information of (1) the locations of the blocks on DNs, and (2) the pipeline-related state
of the block (e.g., {{BlockInfoUnderConstruction}}).

The v8 patch is a combined patch that breaks the implicit dependency between {{INodeFile}}
and the block management layer. This effort is a prerequisite step to allow block management
layer, such as standalone block manager (HDFS-5477), off-heap data structures for block management
(HDFS-7244).

The scope of the changes are the following:

* An {{BlockInfo}} object contains the inode id of the {{INodeFile}} instead of the reference
of the {{INodeFile}} directly. The object also stores the replication factor, while in the
current implementation it is available through {{BlockCollection#getReplication()}}.
* An {{INodeFile}} object stores the {{Block}} objects instead of {{BlockInfo}} objects. A
{{Block}} object only contains the block id, size and the generation stamp of the block.
* When operations need information that is previously available from the {{BlockInfo}} objects
stored in {{INodeFile}}, they have to look up the information by calling {{BlockManager#getStoredBlock()}}.
* Information stored in corresponding {{Block}} / {{BlockInfo}} pairs, such as size of the
blocks and generation stamps are updated consistently.



> Storing block ids instead of BlockInfo object in INodeFile
> ----------------------------------------------------------
>
>                 Key: HDFS-7437
>                 URL: https://issues.apache.org/jira/browse/HDFS-7437
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Haohui Mai
>            Assignee: Haohui Mai
>         Attachments: HDFS-7437.000.patch, HDFS-7437.001.patch, HDFS-7437.002.patch, HDFS-7437.003.patch,
HDFS-7437.004.patch, HDFS-7437.005.patch, HDFS-7437.006.patch, HDFS-7437.007.patch, HDFS-7437.008.patch
>
>
> Currently {{INodeFile}} stores the lists of blocks as references of {{BlockInfo}} instead
of the block ids. This creates implicit dependency between the namespace and the block manager.
> The dependency blocks several recent efforts, such as separating the block manager out
as a standalone service, moving block information off heap, and optimizing the memory usage
of block manager.
> This jira proposes to decouple the dependency by storing block ids instead of object
reference in {{INodeFile}} objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message