incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <robert.new...@gmail.com>
Subject Re: selecting a random subset of a view
Date Mon, 28 Jun 2010 14:52:59 GMT
If your _id's are 'aa' + sha1, then you can generate a random key as
your startkey parameter to the _all_docs view. The next N hits are
randomly selected.

It should be clarified that map/reduce methods have to be
deterministic, so approaches where you'd generate random numbers to
control calls to emit() are not going to work.

B.

On Mon, Jun 28, 2010 at 3:25 PM,  <mickael.bailly@free.fr> wrote:
> Thanks for all your answers
>
> I forgot to tell I try to avoid the "skip" param for performances reasons.
>
> The first method Ian suggest imply to read all my doc ids... In this case I can easily
use the client random function (client is PHP). But reading all doc ids is not really performance
friendly.
>
> The second method Ian speak about, is the one that others (Sebastian, Robert) are proposing
: some random parameter in the document. But as Ian tells us, "If you use the MD5 or SHA function,
then the order will be repeatable if
> the data is not changed."... not so random :-)
>
> My app is a music jukebox, I want to provide a way to fill the playlist with random songs.
Each song document have an _id composed of 'aa'+ a random sha1 .
>
> I scratched my head already but I think the random feature should be a couchdb server
feature, and can't be implemented client-side, but by reading all documents ids and using
client-side random function, which is, once again, not really performance-friendly.
>
> Any idea welcome...
>
> Mickael
>
> ----- Mail Original -----
> De: "Ian Hobson" <ian@ianhobson.co.uk>
> À: user@couchdb.apache.org
> Envoyé: Lundi 28 Juin 2010 16h04:09 GMT +01:00 Amsterdam / Berlin / Berne / Rome / Stockholm
/ Vienne
> Objet: Re: selecting a random subset of a view
>
> On 28/06/2010 14:29, mickael.bailly@free.fr wrote:
>> Hello couchers,
>>
>> how would you do to select a random subset of a view result (a simple view with map
only).
>>
>>
> Hi Mickael,
>
> You want a random sample of predetermined size, from a list that you can
> only (best) access randomly. To do this you must know the number of
> records in the database.
>
> Note - I know the stats side MUCH better than the couchdb so this might
> not be implementable.
>
> Here are two methods.
>
> Method 1.
>
> Say, for example, you want 3 from 11.
>
> Take the first index with a probability of 3/11  by computing a random
> number in range 0 to 1, and taking the record if rand < 3/11 (using real
> math, not integer).
>
> If you take that record, adjust the number required down by 1.
> Reduce the number remaining by 1.
>
> Take the next record with a probability or 2/10 or 3/10
>
> Continue in like manner until you either
>
> a) Take the last record with a probability of 1/1 or
> b) Have all you want, and take the remaining records with a probability
> of 0/n
>
> To do this with couchdb I would read all the IDs into the client and
> filter them there, and then read each records separately.
>
> Method 2 -
>
> Allocate a random number to each record (from a large range - we don't
> want duplicates). This could be a sha or MD5 of the actual data.
> Sort by the random number allocated.
> Read the first N records that you need.
>
> I think this sort of index could be set up on the server, so the client
> needs only create the index, and read the first N records and remove the
> index. The work on the server will be much greater that method 1 though.
>
> If you use the MD5 or SHA function, then the order will be repeatable if
> the data is not changed.
>
> Regards
>
> Ian
>
>

Mime
View raw message