commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <phil.ste...@gmail.com>
Subject Re: [math] random boolean arrays
Date Mon, 13 Jul 2015 13:30:44 GMT
On 7/12/15 9:45 PM, Otmar Ertl wrote:
> On Sun, Jul 12, 2015 at 8:16 PM, Gilles <gilles@harfang.homelinux.org> wrote:
>> On Sun, 12 Jul 2015 19:38:45 +0200, Thomas Neidhart wrote:
>>> On 07/12/2015 04:58 PM, Phil Steitz wrote:
>>>> On 7/12/15 2:50 AM, Thomas Neidhart wrote:
>>>>> On 07/11/2015 09:43 PM, Phil Steitz wrote:
>>>>>> On 7/11/15 12:29 PM, Thomas Neidhart wrote:
>>>>>>> On 07/11/2015 09:08 PM, Phil Steitz wrote:
>>>>>>>> The code implemented in MATH-1242 to improve performance
of KS
>>>>>>>> monteCarloP in-lines efficient generation of random boolean
arrays.
>>>>>>>>   Unfortunately, I think the implementation is not quite
random (see
>>>>>>>> comments on the ticket).  To verify it, we need to be able
to test
>>>>>>>> the random boolean array generation directly.  To do that,
we have
>>>>>>>> to either expose the method (at least as protected) in the
KS class
>>>>>>>> or add it somewhere else.  I propose the latter but am not
sure
>>>>>>>> where to put it.  For speed, we need to avoid array copies,
so the
>>>>>>>> API will have to be something like randomize(boolean[], nTrue).
 It
>>>>>>>> could go in the swelling MathArrays class, or RandomDataGenerator.
>>>>>>>> The latter probably makes more sense, but the API does not
fit too
>>>>>>>> well.  Any ideas?
>>>>>>> If it is just for testing purposes, you can also access the method
in
>>>>>>> question via reflection, see an example here:
>>>>>>>
>>>>>>>
>>>>>>> http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes
>>>>>> Do you think it *should* be a private method of the K-S class?
>>>>> Right now, I do not see much uses outside the class, but if we decide
to
>>>>> make it public then I would prefer a special util class in the random
>>>>> package to avoid cluttering the MathArrays class.
>>>>
>>>> OK, for now we can make it private and use the reflection method
>>>> above to test it.
>>>
>>> ok, but I guess it is also fine to make it package private as sebb
>>> suggested. We did something similar recently for some of the improved
>>> sampling methods provided by Otmar.
>>>
>>>>> Regarding the RandomDataGenerator: I think this class should be
>>>>> deprecated and replaced by a Sampler interface as proposed by Gilles.
>>>>
>>>> Please consider keeping this class.  Consider this a user request.
>>>> I have quite a few applications that use this class for two reasons:
>>>
>>> ok, the reason why I thought the class should be deprecated is because
>>> it was not kept up-to-date with all the new discrete and continuous
>>> distributions that we added in the last 2-3 years. If you think it is
>>> useful, then we can keep it of course.
>>>
>>>> 1.  One object instance tied to one PRNG that generates data from
>>>> multiple different distributions.   This is convenient.   Sure, I
>>>> could refactor all of these apps to instantiate new objects for each
>>>> type of generated data and hopefully still be able to peg them to
>>>> one PRNG; but that is needless work that also complicates the code.
>>>>
>>>> 2.  There are quite a few methods in this class that have nothing to
>>>> do with sampling (nextPermutation, nextHexString, nextSecureXxx,
>>>> etc) but which conveniently share the RandomGenerator.  I guess the
>>>> utility methods get moved out somewhere else.  Again, I end up
>>>> having to refactor all of my code that uses them and when I want
>>>> simulations to be based on a single PRNG, I have to find a way to
>>>> pass the RandomGenerator around to them.
>>>>
>>>> I don't yet see the need to refactor the sampling support in the
>>>> distributions package; but as my own apps are not impacted by this,
>>>> if everyone else sees the user impact of the refactoring as
>>>> outweighed by the benefit, I won't stand in the way.   Please lets
>>>> just keep the RandomDataGenerator convenience class in the random
>>>> package in any case.  I will take care of whatever adjustments are
>>>> needed to adapt to whatever we settle on for sampling in the
>>>> distributions package.
>>>
>>> Well, it is not really necessary to do everything together and refactor
>>> the distributions.
>>>
>>> Probably it is better to start the other way round, and describe what I
>>> want to add, and see how other things fit in:
>>>
>>>  * I want a generic Sampler interface, i.e. something like this:
>>>  ** nextSample()
>>>  ** nextSamples(int size)
>>>  ** nextSamples(double[] samples)
>>
>> +1
>>
>>> there could be a DiscreteSampler and ContinuousSampler interface to
>>> handle the cases for int / double values.
>>
>> Perhaps the name should be IntegerSampler and DoubleSampler, to accomodate
>> future needs (LongSampler, BooleanSampler (?)).
> I would prefer consistent names for distributions and corresponding
> samplers. The support of distributions must be of the same data type
> as the return values of the corresponding sampler. Therefore, I would
> call the samplers for RealDistribution and IntegerDistribution
> RealSampler and IntegerSampler, respectively.

+1

Phil
>
>>> The distributions could either be changed to return such a sampler as
>>> Gilles proposed (with the advantage that no random instance is tied to
>>> the distribution itself), or implement the interface directly (with the
>>> advantage that we would not need to refactor too much).
>>
>> Unless I'm missing something, the refactoring would be fairly the same:
>> The latter case needs implementing 3 methods (2 new ones, one with a name
>> change).
>> The former needs implementing the factory method proposed in MATH-1158,
>> plus the same methods as above (wrapped in the object returned by the
>> factory method).
>>
>>
>> Gilles
>>
>>
>>
>>>>> One can then create a sampler for any distribution or from other
>>>>> sources, e.g. when needing a fast and efficient sampler without
>>>>> replacement (see MATH-1239).
>>>>
>>>> +1 for sequential sampling.  I don't follow exactly why that
>>>> requires refactoring the distributions; but if it helps in a way I
>>>> don't yet understand, that will help convince me that refactoring
>>>> sampling in the distributions package is worth the user pain.
>>>
>>> as I said above, I wanted to combine two things in one step, maybe it is
>>> better to go step by step.
>>>
>>> Thomas
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
> Otmar
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message