couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mickael.bai...@free.fr
Subject Re: selecting a random subset of a view
Date Mon, 28 Jun 2010 14:25:07 GMT
Thanks for all your answers

I forgot to tell I try to avoid the "skip" param for performances reasons.

The first method Ian suggest imply to read all my doc ids... In this case I can easily use
the client random function (client is PHP). But reading all doc ids is not really performance
friendly.

The second method Ian speak about, is the one that others (Sebastian, Robert) are proposing
: some random parameter in the document. But as Ian tells us, "If you use the MD5 or SHA function,
then the order will be repeatable if 
the data is not changed."... not so random :-)

My app is a music jukebox, I want to provide a way to fill the playlist with random songs.
Each song document have an _id composed of 'aa'+ a random sha1 .

I scratched my head already but I think the random feature should be a couchdb server feature,
and can't be implemented client-side, but by reading all documents ids and using client-side
random function, which is, once again, not really performance-friendly.

Any idea welcome...

Mickael

----- Mail Original -----
De: "Ian Hobson" <ian@ianhobson.co.uk>
À: user@couchdb.apache.org
Envoyé: Lundi 28 Juin 2010 16h04:09 GMT +01:00 Amsterdam / Berlin / Berne / Rome / Stockholm
/ Vienne
Objet: Re: selecting a random subset of a view

On 28/06/2010 14:29, mickael.bailly@free.fr wrote:
> Hello couchers,
>
> how would you do to select a random subset of a view result (a simple view with map only).
>
>    
Hi Mickael,

You want a random sample of predetermined size, from a list that you can 
only (best) access randomly. To do this you must know the number of 
records in the database.

Note - I know the stats side MUCH better than the couchdb so this might 
not be implementable.

Here are two methods.

Method 1.

Say, for example, you want 3 from 11.

Take the first index with a probability of 3/11  by computing a random 
number in range 0 to 1, and taking the record if rand < 3/11 (using real 
math, not integer).

If you take that record, adjust the number required down by 1.
Reduce the number remaining by 1.

Take the next record with a probability or 2/10 or 3/10

Continue in like manner until you either

a) Take the last record with a probability of 1/1 or
b) Have all you want, and take the remaining records with a probability 
of 0/n

To do this with couchdb I would read all the IDs into the client and 
filter them there, and then read each records separately.

Method 2 -

Allocate a random number to each record (from a large range - we don't 
want duplicates). This could be a sha or MD5 of the actual data.
Sort by the random number allocated.
Read the first N records that you need.

I think this sort of index could be set up on the server, so the client 
needs only create the index, and read the first N records and remove the 
index. The work on the server will be much greater that method 1 though.

If you use the MD5 or SHA function, then the order will be repeatable if 
the data is not changed.

Regards

Ian


Mime
View raw message