datafu-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew Hayes (JIRA)" <j...@apache.org>
Subject [jira] [Closed] (DATAFU-5) Update SimpleRandomSample (SRS) to be consistent with SimpleRandomSampleWithReplacement (SRSWR)
Date Thu, 06 Mar 2014 04:58:47 GMT

     [ https://issues.apache.org/jira/browse/DATAFU-5?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Matthew Hayes closed DATAFU-5.
------------------------------


> Update SimpleRandomSample (SRS) to be consistent with SimpleRandomSampleWithReplacement
(SRSWR)
> -----------------------------------------------------------------------------------------------
>
>                 Key: DATAFU-5
>                 URL: https://issues.apache.org/jira/browse/DATAFU-5
>             Project: DataFu
>          Issue Type: Improvement
>            Reporter: Xiangrui Meng
>            Assignee: Xiangrui Meng
>             Fix For: 1.3.0
>
>         Attachments: 0001-update-SimpleRandomSample-to-be-consistent-with-Simp.patch,
DATAFU-5.patch, DATAFU-5.patch, DATAFU-5.patch
>
>
> In the current implementation, SRS takes the sampling probability in the constructor
of the UDF, while SRSWR takes the sample size in the function call. The attached patch updates
SRS to make it consistent with SRSWR. 
> After the patch, SRS takes a bag of items, a desired sampling probability, and optionally
a lower bound of the size of the population as the inputs, while SRSWR takes a bag of items,
a desired sample size, and optionally a lower bound of the size of the population as the inputs.
> Another benefit of the patch is that user doesn't have to create multiple instances of
the UDF to sample with different probabilities. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message