Return-Path: X-Original-To: apmail-datafu-dev-archive@minotaur.apache.org Delivered-To: apmail-datafu-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AD30710127 for ; Sun, 16 Feb 2014 13:05:43 +0000 (UTC) Received: (qmail 62598 invoked by uid 500); 16 Feb 2014 13:05:43 -0000 Delivered-To: apmail-datafu-dev-archive@datafu.apache.org Received: (qmail 62554 invoked by uid 500); 16 Feb 2014 13:05:42 -0000 Mailing-List: contact dev-help@datafu.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@datafu.incubator.apache.org Delivered-To: mailing list dev@datafu.incubator.apache.org Received: (qmail 62508 invoked by uid 99); 16 Feb 2014 13:05:41 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 16 Feb 2014 13:05:41 +0000 X-ASF-Spam-Status: No, hits=-2000.6 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO mail.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with SMTP; Sun, 16 Feb 2014 13:05:40 +0000 Received: (qmail 61936 invoked by uid 99); 16 Feb 2014 13:05:20 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 16 Feb 2014 13:05:20 +0000 Date: Sun, 16 Feb 2014 13:05:20 +0000 (UTC) From: "jian wang (JIRA)" To: dev@datafu.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Assigned] (DATAFU-21) Probability weighted sampling without reservoir MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/DATAFU-21?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jian wang reassigned DATAFU-21: ------------------------------- Assignee: jian wang > Probability weighted sampling without reservoir > ----------------------------------------------- > > Key: DATAFU-21 > URL: https://issues.apache.org/jira/browse/DATAFU-21 > Project: DataFu > Issue Type: New Feature > Environment: Mac OS, Linux > Reporter: jian wang > Assignee: jian wang > > This issue is used to track investigation on finding a weighted sampler without using internal reservoir. > At present, the SimpleRandomSample has implemented a good acceptance-rejection sampling algo on probability random sampling. The weighted sampler could utilize the simple random sample with slight modification. > One slight modification is: the present simple random sample generates a uniform random number lies between (0, 1) as the random variable to accept or reject an item. The weighted sample may generate this random variable based on the item's weight and this random number still lies between (0, 1) and each item's random variable remain independent between each other. > Need further think and experiment the correctness of this solution and how to implement it in an effective way. -- This message was sent by Atlassian JIRA (v6.1.5#6160)