hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-512) Set block id as the key to Block
Date Wed, 05 Aug 2009 08:12:15 GMT

    [ https://issues.apache.org/jira/browse/HDFS-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739367#action_12739367

dhruba borthakur commented on HDFS-512:

There are some advantages to using the generation stamp as part of the unique identifier for
a Block object. This ensures that all code correctly identifies that blocks with different
generation stamp are different blocks and can have different contents inside them. It might
not be a big deal for NN data structures, especially because the NN first checks to see if
a block belongs to a file before inserting it into the BlocksMap. But for external tools that
use a block interface (e.g. Balancer, fsck, etc), it might be helpful for them to understand
that blocks with different generation stamps are different blocks (do these utilities use
the Block object at all?)

@Raghu: > This is probably a good time to add Block to ReplicaInfo. 

If we follow Raghu's suggestion, then can we continue using the genstamp as part of the Block

There are other cases, (especially during block report processing) where we would have to
do wild-card lookups for a block. But the cost of these extra lookup calls might be minimal
because they will be in the error-code-path only.

> Set block id as the key to Block
> --------------------------------
>                 Key: HDFS-512
>                 URL: https://issues.apache.org/jira/browse/HDFS-512
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: Append Branch
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: Append Branch
>         Attachments: blockKey.patch
> Currently the key to Block is block id + generation stamp. I would propose to change
it to be only block id. This is based on the following properties of the dfs cluster:
> 1. On each datanode only one replica of block exists. Therefore there is only one generation
of a block.
> 2. NameNode has only one entry for a block in its blocks map.
> With this change, search for a block/replica's meta information is easier since most
of the time we know a block's id but may not know its generation stamp.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message