hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suresh Srinivas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4645) Move from randomly generated block ID to sequentially generated block ID
Date Thu, 28 Mar 2013 17:15:20 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13616433#comment-13616433

Suresh Srinivas commented on HDFS-4645:

I should have added more relevant links. I have added related jira HDFS-898. It describes
the reason for using random block IDs and motivation for moving to sequential block IDs. I
have this as a separate jira because the solution I am proposing is slightly different.

Unlike HDFS-898 where the approach was to find a large enough contiguous range in block ID
space, in this jira, I just want to start with a starting ID and sequentially generate block
IDs. For old installs, if a block ID already exists in block map, we just skip over that.

Sequential generation has lot of advantages. We have a large number of bits at our disposal
that can be used for additional purposes in block ID. This is one thing we considered when
we added federation feature. Add block pool ID into block ID itself, using few bits from the
long. This could also come handy for identifying different categories of blocks in the future,
by using few bits to tag a block as such. There is also advantage of segregating blocks generated
before an epoch given the predictable order of generation.

All the new installs would use sequential blocks. Old installs over a period of time would
also slowly get rid of old blocks with block IDs generated using previous scheme to new scheme
as old data gets deleted.

bq. Will new block ids be allowed to be < 0?
I think we could reserve few block IDs say 0-65535 and start generating from 65535. When it
reaches some max, we could rollover to negative numbers. That is a decision that can be made
in the future.
> Move from randomly generated block ID to sequentially generated block ID
> ------------------------------------------------------------------------
>                 Key: HDFS-4645
>                 URL: https://issues.apache.org/jira/browse/HDFS-4645
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 3.0.0
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
> Currently block IDs are randomly generated. This means there is no pattern to block ID
generation and no guarantees such as uniqueness of block ID for the life time of the system
can be made. I propose using SequentialNumber for block ID generation.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message