Return-Path: X-Original-To: apmail-commons-dev-archive@www.apache.org Delivered-To: apmail-commons-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A4EBB18270 for ; Tue, 29 Dec 2015 16:38:37 +0000 (UTC) Received: (qmail 62596 invoked by uid 500); 29 Dec 2015 16:38:37 -0000 Delivered-To: apmail-commons-dev-archive@commons.apache.org Received: (qmail 62449 invoked by uid 500); 29 Dec 2015 16:38:37 -0000 Mailing-List: contact dev-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Commons Developers List" Delivered-To: mailing list dev@commons.apache.org Received: (qmail 62437 invoked by uid 99); 29 Dec 2015 16:38:37 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Dec 2015 16:38:36 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 7D3B21804A1 for ; Tue, 29 Dec 2015 16:38:36 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.699 X-Spam-Level: X-Spam-Status: No, score=0.699 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id OJ4Et22Oz9cs for ; Tue, 29 Dec 2015 16:38:29 +0000 (UTC) Received: from mail-pf0-f174.google.com (mail-pf0-f174.google.com [209.85.192.174]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 766DE20185 for ; Tue, 29 Dec 2015 16:38:29 +0000 (UTC) Received: by mail-pf0-f174.google.com with SMTP id e65so76841562pfe.1 for ; Tue, 29 Dec 2015 08:38:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-type:content-transfer-encoding; bh=8DOt8jbuqz+6TE0wbFDOs6buJGARp6E8IbczBzQf5L4=; b=CTS9zB6T8DFz0YQWN2ETUC2G+Wa8EVb/ycAmC1SOq+vvh3aCD0zftTqOa61iIfTUKh NizbW0b8VuO4vXsVkDRSG/9sg5OzjnTwtJ670KvjNO4rFU5UjkTSkUBHgnoibNyFFpOl GyhcDdV1DTimu4ur665auzSjj61cFMfaBvyWMGbUOpZOXoqgsxQ5UDlDEvnMgCvO35G7 cHiMtB/GzzS+MwL7S2r1ks+X2qqsNlJdNtGsSKA95Odf+0arLoMmOdvqPcYJzvNpN1ra t4gUxgU4n0hGlhLgg0mJuDLZIlW8HD2SHddxcmyjD57l5XeSMuaU2+BL+Q7ZtBzlhJV5 zB6w== X-Received: by 10.98.64.9 with SMTP id n9mr30579262pfa.87.1451407103423; Tue, 29 Dec 2015 08:38:23 -0800 (PST) Received: from psteitz-mbp.local (ip68-106-39-89.ph.ph.cox.net. [68.106.39.89]) by smtp.googlemail.com with ESMTPSA id yl1sm89257278pac.35.2015.12.29.08.38.22 for (version=TLSv1/SSLv3 cipher=OTHER); Tue, 29 Dec 2015 08:38:22 -0800 (PST) Subject: Re: [Math] About the refactoring of RNGs (Was: [01/18] [math] MATH-1307) To: Commons Developers List References: <6bea105b38b74e90b8018080ccd6c56b@git.apache.org> <56817AB8.5020100@gmail.com> <5681FF04.3070908@gmail.com> <56824298.4080309@gmail.com> <5682535B.4040207@spaceroots.org> From: Phil Steitz X-Enigmail-Draft-Status: N1110 Message-ID: <5682B6FE.4060109@gmail.com> Date: Tue, 29 Dec 2015 09:38:22 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 MIME-Version: 1.0 In-Reply-To: <5682535B.4040207@spaceroots.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 12/29/15 2:33 AM, Luc Maisonobe wrote: > Hi all, > > Le 29/12/2015 09:21, Thomas Neidhart a =C3=A9crit : >> On 12/29/2015 04:33 AM, Phil Steitz wrote: >>> On 12/28/15 8:08 PM, Gilles wrote: >>>> On Mon, 28 Dec 2015 11:08:56 -0700, Phil Steitz wrote: >>>>> The significant refactoring to eliminate the (standard) next(int) >>>>> included in these changes has the possibility of introducing subtle= >>>>> bugs or performance issues. Please run some tests to verify that >>>>> the same sequences are generated by the 3_X code >>>> IIUC our unit tests of the RNGs, this is covered. >>> No. Not sufficient. What you have done is changed the internal >>> implementation of all of the Bitstream generators. I am not >>> convinced that you have not broken anything. I will have to do the >>> testing myself. I see no point in fiddling with the internals of >>> this code that has had a lot of eyeballs and testing on it. I was >>> not personally looking forward to researching the algorithms to make >>> sure any invariants may be broken by these changes; but I am now >>> going to have to do this. I have to ask why. Please at some point >>> read [1], especially the sections on "Avoid Flexibility Syndrom" and >>> "Value Laziness as a Virtue." Gratuitous refactoring drains >>> community energy.=20 >> +1, on top of that I think we should aim to refactor the parts that >> really need refactoring and try to keep the number of incompatibilitie= s >> to the 3_X branch as minimal as possible. >> >> Thomas >> >>>>> and the refactored >>>>> code and benchmarks to show there is no loss in performance. >>>> Given that there are exactly two operations _less_ (a subtraction >>>> and a shift), it would be surprising. >>>> >>>>> It >>>>> would also be good to have some additional review of this code by >>>>> PRNG experts. >>>> The "nextInt()" code is exactly the same as the "next(int)" modulo >>>> the little change above (in the last line of the "nextInt/next" >>>> code). >>>> >>>> That change in "nextInt/next" implied similarly tiny recodings in >>>> the generic methods "nextDouble()", "nextBoolean()", ... which, apar= t >>>> from that, were copied from "BitsStreamGenerator". >>>> >>>> [However tiny a change, I had made a mistake... and dozens of tests >>>> started to fail. Found the typo and all was quiet again...] >>>> >>>> About "next(int)" being standard, it would be interesting to know >>>> what that means. > In all the papers I have read concerning pseudo random number > generation, the basic model was based on small chunks of bits, > much of the time the size of an int because this is what computer > manages directly (they have no provision to manage chunks of 5 or > 11 bits for example). That's what I thought. Internally, is it correct to assume that the generation is always in int-sized blocks so Gilles is correct that the bits parameter is only used for output truncation? > > Deriving other primitive types from this (boolean, long, double) is > really an add-on. I even asked an expert about the (Pierre L'Ecuyer > if I remember well) about some explanations for converting to double > (which is simply done by multiplying by a constant representing the > weight of the least significant bit in order to constrain the range to > [0; 1]). His answer was that this ensured the theoretical mathematical > proofs that apply to uniform distribution still apply, as only this > case (uniformity over a multi-dimensional integral grid) has been > studied. It seems nothing has been studied about using the exponential > features of floating point representation in relationship with > double random number generation directly. Makes sense. The - excellent - tests included with the Well, Mersenne and Isaac generators verify (as Gilles states) that nextInt() itself is likely unaffected by the changes above. What I am worried about is the conversions. Do those changes look correct to you? That is what needs to be carefully reviewed and tested.=20 > > Hence everybody starts from int, and the mathematicians proved us > this method works and some properties are preserved (multi-dimensional > independance, long period, ...) that are essential typically for > Monte-Carlo analyses. > > I know nothing about random number generation for secure application > like cryptograpgy, except that it requires completely different > properties, often completely opposite to what is needed for > Monte-Carlo analysis. As an example, it should be impossible to > reproduce a secure sequence (it cannot be deterministic), whereas in > Monte-Carlo we *want* it to be reproducible if we reuse the same seed. > >>> Have a look at the source code for the JDK generators, for example. >>>> As I indicated quite clearly in one of my first posts about this >>>> refactoring >>>> 1. all the CM implementations generate random bits in batches >>>> of 32 bits, and >>>> 2. before returning, the "next(int bits)" method was truncating >>>> the generated "int": >>>> return x >>> (32 - bits); >>>> >>>> In all implementations, that was the only place where the "bits" >>>> parameter was used, from which I concluded that the randomness >>>> provider does not care if the request was to create less than 32 >>>> random bits. >>>> Taking "nextBoolean()" for example, it looks like a waste of 31 >>>> bits (or am I missing something?). >>> Quite possibly, yes, you are missing something. > I would guess it is linked to performance consideration. Pseudo > random number generation is sometimes put under very heavy stress > with billions of numbers generated. It should run extremelly fast, > and the algorithms have been designed to have tremendously long periods= > (things like 2^19937 -1). With such long periods, we can waste 31 > bits each time we produce 1 bit if it saves some overhead. > > best regards, > Luc > >>>> Of course, if some implementation were able to store the bits not >>>> requested by the last call to "next(int)", then I'd understand that >>>> we must really provide access to a "next(int)" method. >>>> >>>> Perhaps that the overhead of such bookkeeping is why the practical >>>> algorithms chose to store integers rather than bits (?). >>>> >>>> As you dismissed my request about CM being able to care for a RNG >>>> implementation based on a "long", I don't quite understand the >>>> caring for a "next(int)" that serves no more purpose (as of current >>>> CM). >>>> >>> This change is >>>> Gilles >>>> >>>> >>>>> Phil >>>>> >>>>> On 12/28/15 10:23 AM, erans@apache.org wrote: >>>>>> Repository: commons-math >>>>>> Updated Branches: >>>>>> refs/heads/master 7b62d0155 -> 81585a3c4 >>>>>> >>>>>> >>>>>> MATH-1307 >>>>>> >>>>>> New base class for RNG implementations. >>>>>> The source of randomness is provided through the "nextInt()" >>>>>> method (to be defined in subclasses). >>>>>> >>>>>> >>>>>> [...] >>> [1] http://www.apachecon.com/eu2007/materials/ac2006.2.pdf >>>> --------------------------------------------------------------------= - >>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org >>>> For additional commands, e-mail: dev-help@commons.apache.org >>>> >>>> >>> >>> >>> ---------------------------------------------------------------------= >>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org >>> For additional commands, e-mail: dev-help@commons.apache.org >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org >> For additional commands, e-mail: dev-help@commons.apache.org >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org > For additional commands, e-mail: dev-help@commons.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org For additional commands, e-mail: dev-help@commons.apache.org