hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-898) Sequential generation of block ids
Date Fri, 05 Feb 2010 00:46:28 GMT

    [ https://issues.apache.org/jira/browse/HDFS-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829890#action_12829890
] 

Todd Lipcon commented on HDFS-898:
----------------------------------

I get slightly different figures than you guys... I am looking at this as identical to the
well-known Birthday Problem: http://en.wikipedia.org/wiki/Birthday_problem

In our case, we have 2^(64-b) "days" and 2^26 "people"

We have 2^(64-b) "days" and B=2^26 "people". Following the formula on Wikipedia:

{noformat}
In [21]: n = 2^26
In [22]: d = 2^(64-8)
In [23]: reduce(operator.mul, [(1 - float(i)/d)  for i in xrange(0, n)])
Out[23]: 0.0037908372356959502
{noformat}

whereas you've calculated 0.03065 for this case.

The python above agrees with Wikipedia for the birthday example, so I think the code is correct:

{noformat}
In [25]: d = 365
In [26]: n = 23
In [27]: reduce(operator.mul, [(1 - float(i)/d)  for i in xrange(0, n)])
Out[27]: 0.49270276567601451
{noformat}

Wary of floating point math, I also checked using int math to calculate numerator and denominator,
then int division to make them smaller, then float division to get a fraction:
{noformat}
In [70]: num,denom = (reduce(operator.mul, [d - i for i in xrange(0, n)])), (d**(n))
In [71]: float(num/100000000000000000000)/float(denom/100000000000000000000)
Out[71]: 0.0037908372356959502
{noformat}

So where are our numbers diverging?

> Sequential generation of block ids
> ----------------------------------
>
>                 Key: HDFS-898
>                 URL: https://issues.apache.org/jira/browse/HDFS-898
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 0.20.1
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>             Fix For: 0.22.0
>
>         Attachments: DuplicateBlockIds.patch, HighBitProjection.pdf
>
>
> This is a proposal to replace random generation of block ids with a sequential generator
in order to avoid block id reuse in the future.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message