couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Stevens (Gmail)" <wickedg...@gmail.com>
Subject Re: Random Document
Date Tue, 21 Sep 2010 22:19:27 GMT
Unless there are additional restrictions that can be imposed, I'm
pretty sure that you're going to end up needing to get the full list
of IDs, and select x of them at random without replacement to fully
match 'SORT BY RANDOM LIMIT X'.

However, depending on what you are doing with them, it's possible that
other approaches might work.  For example, you could add a
'uniformRandomValue' key to the doc which is set at document creation,
and have a view that does emit(doc.uniformRandomValue, doc._id), then
when you query the view you can (again, depending on what you're doing
with the random selection) either pick the lowest keys ('&limit=X') or
pick a random startkey in the range along with the limit
('&startkey='[0.12345]'&limit=X').  That works great when you're using
the docs as something like a work queue, where after being chosen
once, the docs are removed from the queue.  However, if the docs stick
around, you can end up with problems.  Imagine your
doc.uniformRandomValues look like:

urv: id
0.1: A
0.7: B
0.8: C
0.9: D

Selecting from this distribution with a random startkey and limit of 2
makes it very unlikely that A or D are selected, unless you remove B
and C after they're picked the first time, and implement some sort of
wrap-around to get A if the startkey is 0.85.

If that kind of approach doesn't work for you, then it would be
helpful to more about the requirements.  :)

HTH,
Eli


On Tue, Sep 21, 2010 at 2:49 PM, Peter Braden
<PeterBraden@peterbraden.co.uk> wrote:
> Hi,
>
> I'm after a) - the equivalent of a 'SORT BY RANDOM LIMIT x' sql statement.
>
>> But as this isn't deterministic, I'm pretty sure it's wrong.
>> I don't follow your logic. The view will show all documents in a random
> order. The fact that is is unrepeatable may make it useless for > your
> purposes, but it does not make the maths invalid, or the statistics wrong.
>
> As far as I know, the couchdb internals rely on the fact that view keys are
> deterministic to do their view updates.
>
> I'm not entirely convinced that my current function produces a good random
> selection - if a document is updated more, and therefore it's view entry is
> updated more, does that mean it has a different chance of being selected?
>
> Cheers,
>
> Peter
>
>
>
> On 21 September 2010 20:25, Ian Hobson <ian@ianhobson.co.uk> wrote:
>
>> On 21/09/2010 18:27, Peter Braden wrote:
>>
>>> Hi,
>>>
>>> Is there a good way to get a random document from a database.
>>>
>> Hmm, that depends upon what you mean by "good", and "random" and if you
>> want a repeatable result! I guess I'm asking what exactly are you trying to
>> do?
>>
>> a) Pick a representative, and statistically defensible sample of size X
>> from a population of Y documents where each document has an equal
>> probability of being selected, and cannot be selected twice.
>>
>> b) Take a sample of size 1 from a population of Y, X times (so a given
>> document could be taken more than once)?
>>
>> c) Something similar to a or b where you don't know Y in advance?
>>
>> d) Shuffle the documents?
>>
>>
>>  I'm currently
>>
>>> using a view that does:
>>>
>>> function(doc) {
>>>     emit(Math.random(), doc);
>>> };
>>>
>>> But as this isn't deterministic, I'm pretty sure it's wrong.
>>>
>> I don't follow your logic. The view will show all documents in a random
>> order. The fact that is is unrepeatable may make it useless for your
>> purposes, but it does not make the maths invalid, or the statistics wrong.
>>
>> Regards
>>
>> Ian
>>
>
>
>
> --
> --
> Peter Braden
>
> <http://PeterBraden.co.uk/>
>



-- 
Eli

Mime
View raw message