hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-11152) Better random number generator
Date Mon, 29 Sep 2014 20:19:34 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-11152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152209#comment-14152209

Colin Patrick McCabe commented on HADOOP-11152:

The OpenSSL random number generator should be plenty fast, since it just uses the RDRAND instruction
on Intel CPUs.  We could make this accessible via our usual "pluggable class for generating
random numbers" deal.

bq. One idea is to use something like Mitzenmacher's Power of Two Choices. It's an interesting
to think about how we could determine "load" on a DN: total # of blocks, # of blocks assigned
to it in the last n minutes, # of open blocks

Spark uses Mitzenmacher's work here to coalsce RDDs:


see "pickBin"

> Better random number generator
> ------------------------------
>                 Key: HADOOP-11152
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11152
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Luke Lu
>              Labels: newbie++
> HDFS-7122 showed that naive ThreadLocal usage of simple LCG based j.u.Random creates
unacceptable distribution of random numbers for block placement. Similarly, ThreadLocalRandom
in java 7 (same static thread local with synchronized methods overridden) has the same problem.

> "Better" is defined as better quality and faster than j.u.Random (which is already much
faster (20x) than SecureRandom).
> People (e.g. Numerical Recipes) have shown that by combining LCG and XORShift we can
have a better fast RNG. It'd be worthwhile to investigate a thread local version of these
"better" RNG.

This message was sent by Atlassian JIRA

View raw message