hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-512) Set block id as the key to Block
Date Mon, 03 Aug 2009 19:45:15 GMT

    [ https://issues.apache.org/jira/browse/HDFS-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12738533#action_12738533

Konstantin Shvachko commented on HDFS-512:

Most of the maps Nicholas listed above are {{HashMap}}s. They are based on {{Block.hash()}}
method, which is not modified by the patch, and has never used generation stamp in calculating
block's hash. I found only 4 maps, which use {{TreeSet<Block>}} or {{TreeMap}} with
the {{Block}} as a key. Here they are:
# UnderReplicatedBlocks
# BlockManager.excessReplicateMap
# CorruptReplicasMap
# DatanodeDescriptor.invalidateBlocks

Neither of them need to know about generation stamp.
I think it is safe to make the change. We should commit it to the append branch.

Additional comments:
- {{getReplicaInfo()}} adds generation stamp checking. I don't think this is necessary.
- comment {{// ... ignore generation stamp!!!}} is misleading, should be removed.
- {{ReplicaInfo.setGenStamp(), getGenStamp()}} should rather be called {{setGenerationStamp(),
- Why does {{ReplicaInfo}} need genStamp field. Don't we always have it in {{Block}}? If we
do could you please add a comment clarifying what this field actually is.

> Set block id as the key to Block
> --------------------------------
>                 Key: HDFS-512
>                 URL: https://issues.apache.org/jira/browse/HDFS-512
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.21.0
>         Attachments: blockKey.patch
> Currently the key to Block is block id + generation stamp. I would propose to change
it to be only block id. This is based on the following properties of the dfs cluster:
> 1. On each datanode only one replica of block exists. Therefore there is only one generation
of a block.
> 2. NameNode has only one entry for a block in its blocks map.
> With this change, search for a block/replica's meta information is easier since most
of the time we know a block's id but may not know its generation stamp.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message