datafu-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew Hayes (JIRA)" <>
Subject [jira] [Commented] (DATAFU-16) weighted reservoir sampling with exponential jumps UDF
Date Mon, 10 Feb 2014 22:01:22 GMT


Matthew Hayes commented on DATAFU-16:

I think an exponential jump version of the accumulator-based reservoir sample UDF could make
sense.  It seems like this could help with performance in some cases, especially when producing
a large sample.  Have you run any performance tests to compare the two accumulator-based implementations
to see under what circumstances it helps and by how much?

> weighted reservoir sampling with exponential jumps UDF
> ------------------------------------------------------
>                 Key: DATAFU-16
>                 URL:
>             Project: DataFu
>          Issue Type: New Feature
>         Environment: Mac, Linux
> pig-0.11
>            Reporter: jian wang
>            Priority: Minor
>         Attachments:,,
> Create a weightedReservoirSampleWithExpJump UDF to implement the weighted reservoir sampling
algorithm with exponential jumps. Investigation is tracked in
This task is part of experiment of different weighted sampling algorithms.

This message was sent by Atlassian JIRA

View raw message