incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Hobson <...@ianhobson.co.uk>
Subject Re: selecting a random subset of a view
Date Mon, 28 Jun 2010 14:04:09 GMT
On 28/06/2010 14:29, mickael.bailly@free.fr wrote:
> Hello couchers,
>
> how would you do to select a random subset of a view result (a simple view with map only).
>
>    
Hi Mickael,

You want a random sample of predetermined size, from a list that you can 
only (best) access randomly. To do this you must know the number of 
records in the database.

Note - I know the stats side MUCH better than the couchdb so this might 
not be implementable.

Here are two methods.

Method 1.

Say, for example, you want 3 from 11.

Take the first index with a probability of 3/11  by computing a random 
number in range 0 to 1, and taking the record if rand < 3/11 (using real 
math, not integer).

If you take that record, adjust the number required down by 1.
Reduce the number remaining by 1.

Take the next record with a probability or 2/10 or 3/10

Continue in like manner until you either

a) Take the last record with a probability of 1/1 or
b) Have all you want, and take the remaining records with a probability 
of 0/n

To do this with couchdb I would read all the IDs into the client and 
filter them there, and then read each records separately.

Method 2 -

Allocate a random number to each record (from a large range - we don't 
want duplicates). This could be a sha or MD5 of the actual data.
Sort by the random number allocated.
Read the first N records that you need.

I think this sort of index could be set up on the server, so the client 
needs only create the index, and read the first N records and remove the 
index. The work on the server will be much greater that method 1 though.

If you use the MD5 or SHA function, then the order will be repeatable if 
the data is not changed.

Regards

Ian


Mime
View raw message