Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 58579 invoked from network); 21 Sep 2010 21:49:49 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 21 Sep 2010 21:49:49 -0000 Received: (qmail 13060 invoked by uid 500); 21 Sep 2010 21:49:48 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 12877 invoked by uid 500); 21 Sep 2010 21:49:48 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 12869 invoked by uid 99); 21 Sep 2010 21:49:47 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Sep 2010 21:49:47 +0000 X-ASF-Spam-Status: No, hits=4.4 required=10.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of peterbraden1@gmail.com designates 209.85.161.180 as permitted sender) Received: from [209.85.161.180] (HELO mail-gx0-f180.google.com) (209.85.161.180) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Sep 2010 21:49:43 +0000 Received: by gxk4 with SMTP id 4so3062736gxk.11 for ; Tue, 21 Sep 2010 14:49:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:content-type; bh=/2MYUsLDCY+CsfVqXktIgpbyX2dJHyAapnU0L3PRXcE=; b=ZVA73iIt8wfM/54xlDBsW1OEPfpxIUfruDHGhr9q4NcDAiajtWkvDzNgp7TdPFl62C q9k+FkiQ0iPpyiiG9ntOWqg5hkWZp7fxlMgXBOSmkhyfndt83dO2slMowahKBqR9en0g 44rDkmKiP1ZObESlN9zmtG8WAAMlNztIXkX/I= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; b=BOLA6aJkm/WLKCYaRF5jbkObymiMvmRBFnugcb7JFRFcIU5rSDAnf96iuyEKA0q3Iw mp8khiHCvn27a36u4vtmoj+JPiFGOrjXxW1GyhfmnH4t9wE5gacnH6zrrkMWQrLSI4ZT stUd87MfSE2eL4MtQeeeQcZjosjQKEWWOsNzk= MIME-Version: 1.0 Received: by 10.220.122.203 with SMTP id m11mr6030230vcr.258.1285105759763; Tue, 21 Sep 2010 14:49:19 -0700 (PDT) Sender: peterbraden1@gmail.com Received: by 10.220.203.133 with HTTP; Tue, 21 Sep 2010 14:49:19 -0700 (PDT) In-Reply-To: <4C9906BA.6090205@ianhobson.co.uk> References: <4C9906BA.6090205@ianhobson.co.uk> Date: Tue, 21 Sep 2010 22:49:19 +0100 X-Google-Sender-Auth: -RRMsVGklSGseWo9OeYGW7qsIHU Message-ID: Subject: Re: Random Document From: Peter Braden To: user@couchdb.apache.org Content-Type: multipart/alternative; boundary=0050450167d54f3eb50490cc0064 --0050450167d54f3eb50490cc0064 Content-Type: text/plain; charset=UTF-8 Hi, I'm after a) - the equivalent of a 'SORT BY RANDOM LIMIT x' sql statement. > But as this isn't deterministic, I'm pretty sure it's wrong. > I don't follow your logic. The view will show all documents in a random order. The fact that is is unrepeatable may make it useless for > your purposes, but it does not make the maths invalid, or the statistics wrong. As far as I know, the couchdb internals rely on the fact that view keys are deterministic to do their view updates. I'm not entirely convinced that my current function produces a good random selection - if a document is updated more, and therefore it's view entry is updated more, does that mean it has a different chance of being selected? Cheers, Peter On 21 September 2010 20:25, Ian Hobson wrote: > On 21/09/2010 18:27, Peter Braden wrote: > >> Hi, >> >> Is there a good way to get a random document from a database. >> > Hmm, that depends upon what you mean by "good", and "random" and if you > want a repeatable result! I guess I'm asking what exactly are you trying to > do? > > a) Pick a representative, and statistically defensible sample of size X > from a population of Y documents where each document has an equal > probability of being selected, and cannot be selected twice. > > b) Take a sample of size 1 from a population of Y, X times (so a given > document could be taken more than once)? > > c) Something similar to a or b where you don't know Y in advance? > > d) Shuffle the documents? > > > I'm currently > >> using a view that does: >> >> function(doc) { >> emit(Math.random(), doc); >> }; >> >> But as this isn't deterministic, I'm pretty sure it's wrong. >> > I don't follow your logic. The view will show all documents in a random > order. The fact that is is unrepeatable may make it useless for your > purposes, but it does not make the maths invalid, or the statistics wrong. > > Regards > > Ian > -- -- Peter Braden --0050450167d54f3eb50490cc0064--