couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Filippo Fadda <filippo.fa...@programmazione.it>
Subject Re: Sorting a view by a reduce result (or a better document structure or M/R)
Date Sat, 12 Oct 2013 20:37:53 GMT
So, the short answer is: you can't. The long answer is that I was figuring the same problem
and I have found the following solution.

Every time a user vote, the application saves a new document having doc.type == 'vote'. The
'vote' document has a property called 'recorded', that is really important.

The application uses a design document called 'votes', where is stored a view called 'notRecorded',
that contains just a map function like this one:

              function($doc) {
                if ((doc.type == 'vote') and (!doc.recorded))
                  emit(doc.timestamp, doc);
              }

As you can see, there is not a reduce function, since we need to mark those documents as recorded,
we'll reduce them later on our own.

There is a cron job that runs, in my case, a PHP script. You can choose to run the script
every minute, every five minutes, etc. The script queries the view defined above to get all
the 'not recorded' votes.
Once we have all the 'not recorded' votes, we use a simple function to reduce them. In my
case the function reduces them by postId. The function returns an associative array with the
sum of the 'not recorded' votes, using as keys the post identifiers.
Finally the same script queries another view of another design document scores/perPost, using
as keys the same post identifiers. This is the map function:

              function(doc)) {
                if (doc.type == 'score')
                  emit(doc.postId, doc);
              }

In the end, the script updates every single document of type 'score' retrieved before, adding
to the score.total the total of new voted counted. Then the script updates every single 'vote'
document marking it as recorded. So we'll never count them anymore. There is no race condition
here, no conflicts at all.

To get the posts ordered by score you just need to query another view, let's call it posts/byScore
(ddoc/viewName) like this:

              function(doc)) {
                if (doc.type == 'score')
                  emit(score.total, doc.postId);
              }

I suggest to use _bulk_docs API, within all_or_nothing option, to be sure to change all the
documents as an atomic operation.

I don't know if you are using PHP, but my library has some interesting features. First of
all, you can process a query result chunk by chunk (I use this approach to reduce votes),
and you can also returns null rows in case some post doesn't still have a score, because you
never registered votes. This simplify a lot your job.

This solution counts just the new votes. In a new vote is generated in the meantime, nothing
bad is gonna happen, no conflict is generated. You just operate on a strict set of posts,
just the one have new votes. You'll just change the score related to those documents. That's
all. :-)

So 3 views and 3 document types: post, vote, score. In your case you'll have user instead
of post, but the mechanic remains the same.

Bye

-Filippo

On Oct 11, 2013, at 5:09 PM, Tom van Neerijnen wrote:

> Hi all
> 
> I'm developing a ranking service I'd like to throw scores for multiple
> users at and later be able to retrieve a list of users ranked/ordered by
> their scores. Problem is if I make it time series data and accumulate the
> total score in a reduce I can't put the score in the key to sort by.
> 
> Does anyone have suggestions as to how I could re-jig my documents and/or
> M/R to accommodate this kind of output or should I simply consider storing
> a total document along side all my time series documents? My concern with
> storing a total document is I can never be sure the total in the document
> exactly matches my M/R over my time series data because I'd have to have an
> app query the view and, using the result of the view query, update the
> total document, so that's a last resort.
> 
> I considered sorting in a list function but we're potentially looking at
> millions of users so that's not really an option.
> 
> I'm imagining time series documents like so:
> "abcdefghi" : {
>  "userId": "jim",
>  "scoreChange": 10,
>  "country": "GB",
>  "game": "solitaire",
>  "date": "2013-10-08T13:50:19"
> }
> 
> "jklmnopqr" : {
>  "userId": "jim",
>  "scoreChange": -5,
>  "country": "GB",
>  "game": "solitaire",
>  "date": "2013-10-08T13:51:19"
> }
> 
> "jklmnopqr" : {
>  "userId": "james",
>  "scoreChange": 4,
>  "country": "GB",
>  "game": "freecell",
>  "date": "2013-10-08T13:52:19"
> }
> 
> I could then have this M/R for, as an example, ranks by game then user:
> "by_game": {
>    "map": "function(doc) { emit([doc.game, doc.userId], doc.scoreChange)}",
>    "reduce": "function(keys, values, rereduce) { return sum(values); }"
> }
> 
> and get this list from it:
> {"key":["freecell","james"],"value":4}
> {"key":["solitaire","jim"],"value":5},
> 
> Thanks


Mime
View raw message