couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Jackson <mjijack...@gmail.com>
Subject Re: Hitting reduce_limit with "good" reduce function
Date Thu, 11 Apr 2013 22:40:30 GMT
On Thu, Apr 11, 2013 at 11:21 AM, Robert Newson <rnewson@apache.org> wrote:

> That's not "good" enough for a reduce. A reduce function has to return
> a value *smaller* than the input values, and preferably a lot smaller.
> Returning something as large as a full document is among the set of
> things reduce_limit is trying to discourage.
>

In most cases I anticipate that the reduce value in this case *will* end up
being a lot smaller than the inputs. For example, after I've seen a user a
thousand times all those values will reduce to just one timestamp - the
most recent one - on the user document.


> For this problem, you should emit your timestamps in the map phase and
> use endkey and limit to find the latest. It's not a reduce problem.
>

Thanks for the suggestion. I can totally accept that this isn't a problem
with reduce.

However, I'd still like to do it. :D The main reason is that I'd like to be
able to fetch many users at a time, along with information about when they
were last seen. Using the method I've described I can query the reduce view
with many keys and group=true.

I may not be understanding correctly but emitting just the timestamps in
the map would require me to make two queries: one for the timestamp and one
for the user document.

For reference, here's my map/reduce:

function map(doc) {
  // A profile document is a user.
  if (doc.type === 'profile') {
    emit(doc._id, doc);
  } else if (doc.type === 'presence' && doc.profile_id) {
    emit(doc.profile_id, doc);
  }
}

function reduce(key, values, rereduce) {
  var profile, timestamp;

  var value;
  for (var i = 0, len = values.length; i < len; ++i) {
    value = values[i];

    // At least one of the values is a profile document. In a
    // rereduce we could see many of these, but we don't care
    // because they are the same document.
    if (value.type === 'profile') {
      profile = value;
    }

    // Find the most recent timestamp. In a rereduce the value
    // may be a profile document with a timestamp property.
    if (value.timestamp && (!timestamp || timestamp < value.timestamp)) {
      timestamp = value.timestamp;
    }
  }

  if (profile && timestamp) {
    profile.timestamp = timestamp;
  }

  return profile;
}

I realize it's a bit messy, but it does save me another trip to the
database and it is easily one of the most common queries I'll be doing, so
it's an optimization that seems worthwhile at this point. If there's a
cleaner/better way to get all the same data in one trip to the database,
I'd love to hear it.

--
Michael Jackson
@mjackson

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message