hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsz Wo (Nicholas), SZE (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-512) Set block id as the key to Block
Date Fri, 31 Jul 2009 02:12:14 GMT

    [ https://issues.apache.org/jira/browse/HDFS-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737395#action_12737395
] 

Tsz Wo (Nicholas), SZE commented on HDFS-512:
---------------------------------------------

> The way I look at this problem is that I'd like to treat a generation stamp as a property
of a block just as the property block length. The only key to identify a block is the block
id.

If this is a new class, you are right that we can view it that way.  However, Block is an
existing base class and it is there for a very long time.  We cannot simply view it in one
way yesterday and view it in another way today.  Also, it belongs to the org.apache.hadoop.hdfs.protocol
package.  This may potentially break some external applications.

Inside hdfs, there are
- HashMap<Block, BlockScanInfo> blockMap in DataBlockScanner
- Map<Block, BalancerBlock> globalBlockList in Balancer
- List<HashMap<Block, BalancerBlock>> movedBlocks in Balancer.MovedBlocks
- Map<Block, PendingBlockInfo> pendingReplications in PendingReplicationBlocks
- List<TreeSet<Block>> priorityQueues in UnderReplicatedBlocks
- BlockInfo extends Block
- BlockMetaDataInfo extends Block
- ...

Have you considered the implication to them?

> Set block id as the key to Block
> --------------------------------
>
>                 Key: HDFS-512
>                 URL: https://issues.apache.org/jira/browse/HDFS-512
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.21.0
>
>         Attachments: blockKey.patch
>
>
> Currently the key to Block is block id + generation stamp. I would propose to change
it to be only block id. This is based on the following properties of the dfs cluster:
> 1. On each datanode only one replica of block exists. Therefore there is only one generation
of a block.
> 2. NameNode has only one entry for a block in its blocks map.
> With this change, search for a block/replica's meta information is easier since most
of the time we know a block's id but may not know its generation stamp.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message