couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Randall Leeds <>
Subject Re: selecting a random subset of a view
Date Mon, 28 Jun 2010 15:44:36 GMT
Ian's Method One, implemented as a list function, would probably do the

On Jun 28, 2010 9:04 AM, "Ian Hobson" <> wrote:

On 28/06/2010 14:29, wrote:
> Hello couchers,
> how would you do to selec...
Hi Mickael,

You want a random sample of predetermined size, from a list that you can
only (best) access randomly. To do this you must know the number of records
in the database.

Note - I know the stats side MUCH better than the couchdb so this might not
be implementable.

Here are two methods.

Method 1.

Say, for example, you want 3 from 11.

Take the first index with a probability of 3/11  by computing a random
number in range 0 to 1, and taking the record if rand < 3/11 (using real
math, not integer).

If you take that record, adjust the number required down by 1.
Reduce the number remaining by 1.

Take the next record with a probability or 2/10 or 3/10

Continue in like manner until you either

a) Take the last record with a probability of 1/1 or
b) Have all you want, and take the remaining records with a probability of

To do this with couchdb I would read all the IDs into the client and filter
them there, and then read each records separately.

Method 2 -

Allocate a random number to each record (from a large range - we don't want
duplicates). This could be a sha or MD5 of the actual data.
Sort by the random number allocated.
Read the first N records that you need.

I think this sort of index could be set up on the server, so the client
needs only create the index, and read the first N records and remove the
index. The work on the server will be much greater that method 1 though.

If you use the MD5 or SHA function, then the order will be repeatable if the
data is not changed.



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message