shouldn't list functions be deterministics ? ----- Mail Original ----- De: "Randall Leeds" À: user@couchdb.apache.org Envoyé: Lundi 28 Juin 2010 17h44:36 GMT +01:00 Amsterdam / Berlin / Berne / Rome / Stockholm / Vienne Objet: Re: selecting a random subset of a view Ian's Method One, implemented as a list function, would probably do the trick. On Jun 28, 2010 9:04 AM, "Ian Hobson" wrote: On 28/06/2010 14:29, mickael.bailly@free.fr wrote: > > Hello couchers, > > how would you do to selec... Hi Mickael, You want a random sample of predetermined size, from a list that you can only (best) access randomly. To do this you must know the number of records in the database. Note - I know the stats side MUCH better than the couchdb so this might not be implementable. Here are two methods. Method 1. Say, for example, you want 3 from 11. Take the first index with a probability of 3/11 by computing a random number in range 0 to 1, and taking the record if rand < 3/11 (using real math, not integer). If you take that record, adjust the number required down by 1. Reduce the number remaining by 1. Take the next record with a probability or 2/10 or 3/10 Continue in like manner until you either a) Take the last record with a probability of 1/1 or b) Have all you want, and take the remaining records with a probability of 0/n To do this with couchdb I would read all the IDs into the client and filter them there, and then read each records separately. Method 2 - Allocate a random number to each record (from a large range - we don't want duplicates). This could be a sha or MD5 of the actual data. Sort by the random number allocated. Read the first N records that you need. I think this sort of index could be set up on the server, so the client needs only create the index, and read the first N records and remove the index. The work on the server will be much greater that method 1 though. If you use the MD5 or SHA function, then the order will be repeatable if the data is not changed. Regards Ian