commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex D Herbert (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (RNG-50) PoissonSampler single use speed improvements
Date Fri, 27 Jul 2018 21:00:00 GMT

    [ https://issues.apache.org/jira/browse/RNG-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560319#comment-16560319
] 

Alex D Herbert commented on RNG-50:
-----------------------------------

{quote}Would it be good enough for your use-case?{quote}

Yes. That is what I mocked up in the simple test attached to this ticket. The effect is that
repeated single use of the {{PoissonSampler}} is very close to re-use with the same sampler.

Off topic: Where would be a good place to provide feedback on the library? For example there
is no way to programatically obtain the information that is in the Javadoc of {{org.apache.commons.rng.simple.RandomSource}}.
This could be done with e.g:

{code:java}
public enum RandomSource {
    /**
     * Source of randomness is {@link org.apache.commons.rng.core.source32.JDKRandom}.
     * <ul>
     *  <li>Native seed type: {@code Long}.</li>
     *  <li>Native seed size: 1.</li>
     * </ul>
     */
    JDK(ProviderBuilder.RandomSourceInternal.JDK) { 
          public int getSeedSize() { return 1; },
     },

     // etc ...
     ;

     public Class<?> getSeedType() { getInternalIdentifier().getSeed(); }
     public abstract int getSeedSize();
}
{code}

There is obviously some overlap with the {{ProviderBuilder.RandomSourceInternal}} and it would
be good to at least expose the arguments required for 

{code:java}
    public static RestorableUniformRandomProvider create(RandomSource source,
                                                         Object seed,
                                                         Object ... data)
{code}

I am only asking as I am creating a seed for the {{Well19937c}} class that I can use to create
it repeatedly with little cost. If I know the seed length then I can store the seed and use
it for the constructor. I can then change the type of RNG dynamically and appropriately store
the correct seed. At the moment I would have to create a function to return the seed size
and type for each RNG I want to support using the information in the Javadocs.

As far as I can see the use of the {{RestorableUniformRandomProvider}} requires that you first
construct a provider (so paying the seeding cost) and then you can restore it to a previous
state. So the initialisation with the constructor seed is a waste. It would be nice to have
a constructor or factory method that accepts a {{RandomProviderState}} instead.

I am sure as a I get more familiar with the library I'd like to give more feedback so is this
the best place for that (i.e. new feature request tickets)?



> PoissonSampler single use speed improvements
> --------------------------------------------
>
>                 Key: RNG-50
>                 URL: https://issues.apache.org/jira/browse/RNG-50
>             Project: Commons RNG
>          Issue Type: Improvement
>    Affects Versions: 1.0
>            Reporter: Alex D Herbert
>            Priority: Minor
>         Attachments: PoissonSamplerTest.java
>
>
> The Sampler architecture of {{org.apache.commons.rng.sampling.distribution}} is nicely
written for fast sampling of small dataset sizes. The constructors for the samplers do not
check the input parameters are valid for the respective distributions (in contrast to the
old {{org.apache.commons.math3.random.distribution}} classes). I assume this is a design choice
for speed. Thus most of the samplers can be used within a loop to sample just one value with
very little overhead.
> The {{PoissonSampler}} precomputes log factorial numbers upon construction if the mean
is above 40. This is done using the {{InternalUtils.FactorialLog}} class. As of version 1.0
this internal class is currently only used in the {{PoissonSampler}}.
> The cache size is limited to 2*PIVOT (where PIVOT=40). But it creates and precomputes
the cache every time a PoissonSampler is constructed if the mean is above the PIVOT value.
> Why not create this once in a static block for the PoissonSampler?
> {code:java}
> /** {@code log(n!)}. */
> private static final FactorialLog factorialLog;
>      
> static 
> {
>     factorialLog = FactorialLog.create().withCache((int) (2 * PoissonSampler.PIVOT));
> }
> {code}
> This will make the construction cost of a new {{PoissonSampler}} negligible. If the table
is computed dynamically as a static construction method then the overhead will be in the first
use. Thus the following call will be much faster:
> {code:java}
> UniformRandomProvider rng = ...;
> int value = new PoissonSampler(rng, 50).sample();
> {code}
> I have tested this modification (see attached file) and the results are:
> {noformat}
> Mean 40  Single construction ( 7330792) vs Loop construction                        
 (24334724)   (3.319522.2x faster)
> Mean 40  Single construction ( 7330792) vs Loop construction with static FactorialLog
( 7990656)   (1.090013.2x faster)
> Mean 50  Single construction ( 6390303) vs Loop construction                        
 (19389026)   (3.034132.2x faster)
> Mean 50  Single construction ( 6390303) vs Loop construction with static FactorialLog
( 6146556)   (0.961857.2x faster)
> Mean 60  Single construction ( 6041165) vs Loop construction                        
 (21337678)   (3.532047.2x faster)
> Mean 60  Single construction ( 6041165) vs Loop construction with static FactorialLog
( 5329129)   (0.882136.2x faster)
> Mean 70  Single construction ( 6064003) vs Loop construction                        
 (23963516)   (3.951765.2x faster)
> Mean 70  Single construction ( 6064003) vs Loop construction with static FactorialLog
( 5306081)   (0.875013.2x faster)
> Mean 80  Single construction ( 6064772) vs Loop construction                        
 (26381365)   (4.349935.2x faster)
> Mean 80  Single construction ( 6064772) vs Loop construction with static FactorialLog
( 6341274)   (1.045591.2x faster)
> {noformat}
> Thus the speed improvements would be approximately 3-4 fold for single use Poisson sampling.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message