[ https://issues.apache.org/jira/browse/HDFS898?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=12829890#action_12829890
]
Todd Lipcon commented on HDFS898:

I get slightly different figures than you guys... I am looking at this as identical to the
wellknown Birthday Problem: http://en.wikipedia.org/wiki/Birthday_problem
In our case, we have 2^(64b) "days" and 2^26 "people"
We have 2^(64b) "days" and B=2^26 "people". Following the formula on Wikipedia:
{noformat}
In [21]: n = 2^26
In [22]: d = 2^(648)
In [23]: reduce(operator.mul, [(1  float(i)/d) for i in xrange(0, n)])
Out[23]: 0.0037908372356959502
{noformat}
whereas you've calculated 0.03065 for this case.
The python above agrees with Wikipedia for the birthday example, so I think the code is correct:
{noformat}
In [25]: d = 365
In [26]: n = 23
In [27]: reduce(operator.mul, [(1  float(i)/d) for i in xrange(0, n)])
Out[27]: 0.49270276567601451
{noformat}
Wary of floating point math, I also checked using int math to calculate numerator and denominator,
then int division to make them smaller, then float division to get a fraction:
{noformat}
In [70]: num,denom = (reduce(operator.mul, [d  i for i in xrange(0, n)])), (d**(n))
In [71]: float(num/100000000000000000000)/float(denom/100000000000000000000)
Out[71]: 0.0037908372356959502
{noformat}
So where are our numbers diverging?
> Sequential generation of block ids
> 
>
> Key: HDFS898
> URL: https://issues.apache.org/jira/browse/HDFS898
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: namenode
> Affects Versions: 0.20.1
> Reporter: Konstantin Shvachko
> Assignee: Konstantin Shvachko
> Fix For: 0.22.0
>
> Attachments: DuplicateBlockIds.patch, HighBitProjection.pdf
>
>
> This is a proposal to replace random generation of block ids with a sequential generator
in order to avoid block id reuse in the future.

This message is automatically generated by JIRA.

You can reply to this email to add a comment to the issue online.
