Return-Path: X-Original-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4D3E61733A for ; Mon, 29 Sep 2014 20:19:35 +0000 (UTC) Received: (qmail 10140 invoked by uid 500); 29 Sep 2014 20:19:35 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 10103 invoked by uid 500); 29 Sep 2014 20:19:35 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 10081 invoked by uid 99); 29 Sep 2014 20:19:34 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Sep 2014 20:19:34 +0000 Date: Mon, 29 Sep 2014 20:19:34 +0000 (UTC) From: "Colin Patrick McCabe (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HADOOP-11152) Better random number generator MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HADOOP-11152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152209#comment-14152209 ] Colin Patrick McCabe commented on HADOOP-11152: ----------------------------------------------- The OpenSSL random number generator should be plenty fast, since it just uses the RDRAND instruction on Intel CPUs. We could make this accessible via our usual "pluggable class for generating random numbers" deal. bq. One idea is to use something like Mitzenmacher's Power of Two Choices. It's an interesting to think about how we could determine "load" on a DN: total # of blocks, # of blocks assigned to it in the last n minutes, # of open blocks Spark uses Mitzenmacher's work here to coalsce RDDs: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/CoalescedRDD.scala see "pickBin" > Better random number generator > ------------------------------ > > Key: HADOOP-11152 > URL: https://issues.apache.org/jira/browse/HADOOP-11152 > Project: Hadoop Common > Issue Type: Improvement > Reporter: Luke Lu > Labels: newbie++ > > HDFS-7122 showed that naive ThreadLocal usage of simple LCG based j.u.Random creates unacceptable distribution of random numbers for block placement. Similarly, ThreadLocalRandom in java 7 (same static thread local with synchronized methods overridden) has the same problem. > "Better" is defined as better quality and faster than j.u.Random (which is already much faster (20x) than SecureRandom). > People (e.g. Numerical Recipes) have shown that by combining LCG and XORShift we can have a better fast RNG. It'd be worthwhile to investigate a thread local version of these "better" RNG. -- This message was sent by Atlassian JIRA (v6.3.4#6332)