datafu-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew Hayes (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DATAFU-16) weighted reservoir sampling with exponential jumps UDF
Date Mon, 10 Feb 2014 22:01:22 GMT

    [ https://issues.apache.org/jira/browse/DATAFU-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897085#comment-13897085
] 

Matthew Hayes commented on DATAFU-16:
-------------------------------------

I think an exponential jump version of the accumulator-based reservoir sample UDF could make
sense.  It seems like this could help with performance in some cases, especially when producing
a large sample.  Have you run any performance tests to compare the two accumulator-based implementations
to see under what circumstances it helps and by how much?

> weighted reservoir sampling with exponential jumps UDF
> ------------------------------------------------------
>
>                 Key: DATAFU-16
>                 URL: https://issues.apache.org/jira/browse/DATAFU-16
>             Project: DataFu
>          Issue Type: New Feature
>         Environment: Mac, Linux
> pig-0.11
>            Reporter: jian wang
>            Priority: Minor
>         Attachments: ScoredExpJmpReservoir.java, ScoredReservoir.java, WeightedSamplingCorrectnessTests.java
>
>
> Create a weightedReservoirSampleWithExpJump UDF to implement the weighted reservoir sampling
algorithm with exponential jumps. Investigation is tracked in  https://github.com/linkedin/datafu/issues/80.
This task is part of experiment of different weighted sampling algorithms.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message