Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 72755 invoked from network); 21 Sep 2010 22:19:57 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 21 Sep 2010 22:19:57 -0000 Received: (qmail 52548 invoked by uid 500); 21 Sep 2010 22:19:55 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 52303 invoked by uid 500); 21 Sep 2010 22:19:54 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 52294 invoked by uid 99); 21 Sep 2010 22:19:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Sep 2010 22:19:54 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of wickedgrey@gmail.com designates 209.85.160.180 as permitted sender) Received: from [209.85.160.180] (HELO mail-gy0-f180.google.com) (209.85.160.180) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Sep 2010 22:19:50 +0000 Received: by gyg13 with SMTP id 13so3047225gyg.11 for ; Tue, 21 Sep 2010 15:19:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=2GlFvhzLdZB+Q0Yo/x21J5glD1keoMkcv4cYh171L+A=; b=B2x7P62DoZDKDd+0Ux5FiarwKTWxdvpn3OoHTbi3y1bSnDSJq31DP1LtFAVzo5NXRm zPynVcC4CW9bmv0wAr9KCV4BbGCuucKO9INtD3ND+3Ux69EvtPksvYrbl0b+3I/gicFI KoFxNBRw8A07FqshoRdmJhm3BQkAN6LqsyS+Y= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=Sfee36Le5k8a8gqqcPt9xdIzCLKd+86q26NF0vNISbclZ8NRpIzkI9/1HpyZ3etVMh Z9ToZ186s6/4Z26HsHFc8eIxrhMTTozBlwYB7vZSMmnOF3x/EImFrX8jEyVDrxumv0KA 8zAs3dlGYjBOWE10c17dDm0Sxlymu29gecjLI= MIME-Version: 1.0 Received: by 10.151.135.12 with SMTP id m12mr77520ybn.174.1285107567971; Tue, 21 Sep 2010 15:19:27 -0700 (PDT) Received: by 10.150.192.10 with HTTP; Tue, 21 Sep 2010 15:19:27 -0700 (PDT) In-Reply-To: References: <4C9906BA.6090205@ianhobson.co.uk> Date: Tue, 21 Sep 2010 15:19:27 -0700 Message-ID: Subject: Re: Random Document From: "Eli Stevens (Gmail)" To: user@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Unless there are additional restrictions that can be imposed, I'm pretty sure that you're going to end up needing to get the full list of IDs, and select x of them at random without replacement to fully match 'SORT BY RANDOM LIMIT X'. However, depending on what you are doing with them, it's possible that other approaches might work. For example, you could add a 'uniformRandomValue' key to the doc which is set at document creation, and have a view that does emit(doc.uniformRandomValue, doc._id), then when you query the view you can (again, depending on what you're doing with the random selection) either pick the lowest keys ('&limit=3DX') or pick a random startkey in the range along with the limit ('&startkey=3D'[0.12345]'&limit=3DX'). That works great when you're using the docs as something like a work queue, where after being chosen once, the docs are removed from the queue. However, if the docs stick around, you can end up with problems. Imagine your doc.uniformRandomValues look like: urv: id 0.1: A 0.7: B 0.8: C 0.9: D Selecting from this distribution with a random startkey and limit of 2 makes it very unlikely that A or D are selected, unless you remove B and C after they're picked the first time, and implement some sort of wrap-around to get A if the startkey is 0.85. If that kind of approach doesn't work for you, then it would be helpful to more about the requirements. :) HTH, Eli On Tue, Sep 21, 2010 at 2:49 PM, Peter Braden wrote: > Hi, > > I'm after a) - the equivalent of a 'SORT BY RANDOM LIMIT x' sql statement= . > >> But as this isn't deterministic, I'm pretty sure it's wrong. >> I don't follow your logic. The view will show all documents in a random > order. The fact that is is unrepeatable may make it useless for > your > purposes, but it does not make the maths invalid, or the statistics wrong= . > > As far as I know, the couchdb internals rely on the fact that view keys a= re > deterministic to do their view updates. > > I'm not entirely convinced that my current function produces a good rando= m > selection - if a document is updated more, and therefore it's view entry = is > updated more, does that mean it has a different chance of being selected? > > Cheers, > > Peter > > > > On 21 September 2010 20:25, Ian Hobson wrote: > >> On 21/09/2010 18:27, Peter Braden wrote: >> >>> Hi, >>> >>> Is there a good way to get a random document from a database. >>> >> Hmm, that depends upon what you mean by "good", and "random" and if you >> want a repeatable result! I guess I'm asking what exactly are you trying= to >> do? >> >> a) Pick a representative, and statistically defensible sample of size X >> from a population of Y documents where each document has an equal >> probability of being selected, and cannot be selected twice. >> >> b) Take a sample of size 1 from a population of Y, X times (so a given >> document could be taken more than once)? >> >> c) Something similar to a or b where you don't know Y in advance? >> >> d) Shuffle the documents? >> >> >> =A0I'm currently >> >>> using a view that does: >>> >>> function(doc) { >>> =A0 =A0 emit(Math.random(), doc); >>> }; >>> >>> But as this isn't deterministic, I'm pretty sure it's wrong. >>> >> I don't follow your logic. The view will show all documents in a random >> order. The fact that is is unrepeatable may make it useless for your >> purposes, but it does not make the maths invalid, or the statistics wron= g. >> >> Regards >> >> Ian >> > > > > -- > -- > Peter Braden > > > --=20 Eli