commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex D Herbert (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (RNG-50) PoissonSampler single use speed improvements
Date Thu, 26 Jul 2018 20:14:00 GMT

    [ https://issues.apache.org/jira/browse/RNG-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558843#comment-16558843
] 

Alex D Herbert commented on RNG-50:
-----------------------------------

OK. The PoissonSampler does check the input mean parameter. It doesn't check the RNG is not
null. I was looking for example at the {{BoxMullerGaussianSampler}} and the {{AhrensDieterMarsagliaTsangGammaSampler}}
where the constructors params are not checked. It is these three samplers that I am interested
in for my use case which is modelling the signal from point sources of light on a microscope
camera, as in super-resolution microscopy.

Ignoring the Gaussian and Gamma sampling the use case of the Poisson sampling is to model
photon shot noise on a camera image, e.g 512x512 is 262,144 pixels. Each pixel value is likely
to be unique during a simulation that creates floating-point images. This will be repeated
for potentially thousands of frames. So this is a lot of single Poisson samples.

I understand that you only want to create a cache if it is to be used. However it is being
recreated each time the PoissonSampler is used for mean above 40. This is a sub-optimal cache.

My code fix to set the {{FactorialLog}} as static would mean that any use of the {{PoissonSampler}}
with a mean under 40 would suffer the overhead of computing the cache. However the cache will
be computed only once. Subsequent uses would not suffer.

A better solution would be improved reuse of previously computed values in the {{FactorialLog}}
class. This can be done using a single static master table of {{log(n!)}} values that can
be copied. Here is an example that I created for my own functions that require {{log(n!)}}:
[LogFactorial|https://github.com/aherbert/GDSC-SMLM/blob/master/src/main/java/uk/ac/sussex/gdsc/smlm/function/LogFactorial.java]

The master table can be increased in size and reduced. An instance can be created that reuses
values from the master table and computes the rest it needs.

If the {{FactorialLog}} class is modified then when it is used by other samplers then they
can take advantage of any already computed values.
 

 

> PoissonSampler single use speed improvements
> --------------------------------------------
>
>                 Key: RNG-50
>                 URL: https://issues.apache.org/jira/browse/RNG-50
>             Project: Commons RNG
>          Issue Type: Improvement
>    Affects Versions: 1.0
>            Reporter: Alex D Herbert
>            Priority: Minor
>         Attachments: PoissonSamplerTest.java
>
>
> The Sampler architecture of {{org.apache.commons.rng.sampling.distribution}} is nicely
written for fast sampling of small dataset sizes. The constructors for the samplers do not
check the input parameters are valid for the respective distributions (in contrast to the
old {{org.apache.commons.math3.random.distribution}} classes). I assume this is a design choice
for speed. Thus most of the samplers can be used within a loop to sample just one value with
very little overhead.
> The {{PoissonSampler}} precomputes log factorial numbers upon construction if the mean
is above 40. This is done using the {{InternalUtils.FactorialLog}} class. As of version 1.0
this internal class is currently only used in the {{PoissonSampler}}.
> The cache size is limited to 2*PIVOT (where PIVOT=40). But it creates and precomputes
the cache every time a PoissonSampler is constructed if the mean is above the PIVOT value.
> Why not create this once in a static block for the PoissonSampler?
> {code:java}
> /** {@code log(n!)}. */
> private static final FactorialLog factorialLog;
>      
> static 
> {
>     factorialLog = FactorialLog.create().withCache((int) (2 * PoissonSampler.PIVOT));
> }
> {code}
> This will make the construction cost of a new {{PoissonSampler}} negligible. If the table
is computed dynamically as a static construction method then the overhead will be in the first
use. Thus the following call will be much faster:
> {code:java}
> UniformRandomProvider rng = ...;
> int value = new PoissonSampler(rng, 50).sample();
> {code}
> I have tested this modification (see attached file) and the results are:
> {noformat}
> Mean 40  Single construction ( 7330792) vs Loop construction                        
 (24334724)   (3.319522.2x faster)
> Mean 40  Single construction ( 7330792) vs Loop construction with static FactorialLog
( 7990656)   (1.090013.2x faster)
> Mean 50  Single construction ( 6390303) vs Loop construction                        
 (19389026)   (3.034132.2x faster)
> Mean 50  Single construction ( 6390303) vs Loop construction with static FactorialLog
( 6146556)   (0.961857.2x faster)
> Mean 60  Single construction ( 6041165) vs Loop construction                        
 (21337678)   (3.532047.2x faster)
> Mean 60  Single construction ( 6041165) vs Loop construction with static FactorialLog
( 5329129)   (0.882136.2x faster)
> Mean 70  Single construction ( 6064003) vs Loop construction                        
 (23963516)   (3.951765.2x faster)
> Mean 70  Single construction ( 6064003) vs Loop construction with static FactorialLog
( 5306081)   (0.875013.2x faster)
> Mean 80  Single construction ( 6064772) vs Loop construction                        
 (26381365)   (4.349935.2x faster)
> Mean 80  Single construction ( 6064772) vs Loop construction with static FactorialLog
( 6341274)   (1.045591.2x faster)
> {noformat}
> Thus the speed improvements would be approximately 3-4 fold for single use Poisson sampling.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message