flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-2533) Gap based random sample optimization
Date Wed, 09 Sep 2015 02:49:45 GMT

    [ https://issues.apache.org/jira/browse/FLINK-2533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736038#comment-14736038
] 

ASF GitHub Bot commented on FLINK-2533:
---------------------------------------

Github user ChengXiangLi commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1110#discussion_r39002786
  
    --- Diff: flink-java/src/main/java/org/apache/flink/api/java/sampling/PoissonSampler.java
---
    @@ -28,11 +29,16 @@
      *
      * @param <T> The type of sample.
      * @see <a href="https://en.wikipedia.org/wiki/Poisson_distribution">https://en.wikipedia.org/wiki/Poisson_distribution</a>
    + * @see <a href="http://erikerlandson.github.io/blog/2014/09/11/faster-random-samples-with-gap-sampling/">Gap
Sampling</a>
      */
     public class PoissonSampler<T> extends RandomSampler<T> {
     	
     	private PoissonDistribution poissonDistribution;
     	private final double fraction;
    +	private final Random random = new Random();
    --- End diff --
    
    The "random" instance should better to be initialized in constructor method and use the
seed parameter while available.


> Gap based random sample optimization
> ------------------------------------
>
>                 Key: FLINK-2533
>                 URL: https://issues.apache.org/jira/browse/FLINK-2533
>             Project: Flink
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Chengxiang Li
>            Priority: Minor
>
> For random sampler with fraction, like BernoulliSampler and PoissonSampler, Gap based
random sampler could exploit O(k) sample implementation instead of previous O\(n\) sample
implementation, it should perform better while sample fraction is very small. [This blog|http://erikerlandson.github.io/blog/2014/09/11/faster-random-samples-with-gap-sampling/]
describes more detail about gap based random sampler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message