hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-898) Sequential generation of block ids
Date Thu, 04 Feb 2010 23:58:28 GMT

    [ https://issues.apache.org/jira/browse/HDFS-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829860#action_12829860

Konstantin Shvachko commented on HDFS-898:

h2. Approach #2

Here is another idea initiated by Rob Chansler, which will potentially let us free all positive
longs for future block id generation.
Let's set 8 or less high bits of existing block ids to 1. There is a high probability there
will be no id collisions in the resulting set.
More formally. We have a collection of blocks <b ~i~>.  And we build a new collection
<c ~i~ = Mask | b ~i~>, where Mask = ff00 0000 0000 0000. Collection <b ~i~> does
not have repetitions, and there is a very good chance that collection <c ~i~> also does
not have them.
The intuition here is that if you have randomly distributed points in n-dimensional space
and you project them into (n-1)-dimensional sub-space, say along one of the axis, the probability
that two points will be projected into the same image is low.

I am attaching a document which derives a formula for the probability. The formula was independently
confirmed by Nicholas. The bottom line is this:
- The probability of getting a collision when setting the first 8 bits to 1 is < 0.03065
- The probability of getting a collision when setting only one high bit to 1 is < 0.0001221

>From practical point of view it is enough to set at least one highest bit. Then we'll
free the entire segment of positive longs for new block id generation.

I implemented a block id analyzing routine, which combines the two approaches described in
the issue. The routine reads the image using OIV and first tries to reset bits starting from
the high 8 and going down to 1 highest bit whenever resetting more bits leads to collisions.
If the routine fails to reset any bits collision-free, then it uses the initial approach,
that is, just finds the biggest free gap in existing block ids.

If the bit reset approach succeeds then block replicas need to be renamed on data-nodes. This
will be done via an upgrade. During the upgrade data-nodes hardlink replicas into the new
directory (automatically handled by the upgrade framework), and then rename the newly created
links reflecting the new block ids (the specific part of this upgrade).
In order to avoid any mishaps I also propose to assign new generation stamp to all blocks
renamed. So that when data-nodes that missed the upgrade wake up they will not report old
blocks. The latter is impossible, because data-nodes cannot join the cluster until they upgrade,
but we don't want to take any chances.

I tested the routine on 6 images so far, containing from 150,000 to 40,000,000 blocks. All
of them successfully tolerated the reset of 8 bits without ANY collisions.

I would like to ask the community for some help. If people could run the tool on their images
and post some stats or just send the results to me I'll take care of summarizing them.

> Sequential generation of block ids
> ----------------------------------
>                 Key: HDFS-898
>                 URL: https://issues.apache.org/jira/browse/HDFS-898
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 0.20.1
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>             Fix For: 0.22.0
>         Attachments: HighBitProjection.pdf
> This is a proposal to replace random generation of block ids with a sequential generator
in order to avoid block id reuse in the future.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message