commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gilles <gil...@harfang.homelinux.org>
Subject Re: [math] Thread-safety and initialization overhead in distribution sample() methods
Date Thu, 09 Oct 2014 12:43:56 GMT
> > [...]
>>>
>>
>> That proposal would
>> 1. fix the performance issue reported in MATH-1154,
>>
>
> this proposal would only fix it in 4.0, as we can not remove the rng
> references in the distribution classes immediately.

<tease>
  There is not a single way
  I can oppose what you say:
  This cannot be a good day!
</tease>

The proposal is design-oriented.
I seem to recall (but I may be wrong) that a similar thing was
proposed (not by me) when "sample" was introduced, but was not
retained.
It was a wrong decision at the time, since this would have made
the design cleaner. But this argument is generally not well
received here, unfortunately (even when the opportunity for
change is given by an identified issue).

>
>> 2. fix that issue without resorting to the so called "smelly
>>    workaround",
>>
>
> there is nothing smelly about it,

I agree: That was not _my_ statement.

> and if you want to fix this issue in 3.4
> there is not much choice

As I said early on, you could have applied the patch already. I
do not even know why there need to be a discussion about it as
it fixes an issue with zero side-effect.

> (apart from the lazy initialization proposal).

Otmar's proposal (explicit "LazyRNG"): +1
Hidden lazy init in "AbstractRealDistribution" or in the
individual RNGs: -1 (with rationale given in a previous message).

But I now agree with Phil that it is more complex than necessary.
(i.e. I humbly find my proposal simpler).

>> 3. simplify the "RealDistribution" interface namely by
>>    removing the "reseedRandomGenerator" method (whose presence
>>    has always been a flaw, IMO): the RNG is and stays under the
>>    caller's explicit responsibility,
>> 4. simplify the concrete distribution classes, by removing all
>>    the constructors that take a RNG argument,
>> 5. make the distribution classes thread-safe, by making the
>>    "devilishly hard" problem (of ensuring RNG thread-safety) go
>>    away.
>> Moreover:
>> 6. An application layer that only needs sampling could pass a
>>    "Sampler" object: hiding the implicit distribution could make
>>    the application cleaner (self-documented).
>> 7. The modification can be backwards compatible, and user can
>>    upgrade easily.
>>
>
> with the other points I agree,

Thank you!

And I forgot some:

8a. No risk to raise a NPE by an unforeseen usage of "sample", and thus
8b. No need to document a warning about it.

> but I would make the sample(RandomGenerator)
> method public, so that users have the choice to directly call it 
> rather
> than create a Sampler for this distribution.

The idea (i.e my opinion, which I naturally find better, as
long as I hold it :) was to minimize the public interface of
"RealDistribution", up to not even mention the notion of
"sampling".
Actually, we could separate even more and have "Sampler" and
"SamplerProvider" interfaces, independent of the
"RealDistribution" interface.
Thus each concrete distribution is free to implement and/or
advertize (or not) a sampling functionality.

This is even cleaner; but it requires a little refactoring of the
unit tests (i.e. define a method "makeSamplerProvider" to get an
object of the apporpriate type for sampling-specific tests).

Alternatively, if we make "sample" part of "RealDistribution", we
limit the generality a priori. Even though it is not a problem
right now (as all the distributions implemented in CM can provide
sampling), we cannot know (as we didn't know that issue MATH-1154
would come up) and as simple as can be (but not simpler) is better.


Gilles


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message