incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Higham <mar...@ocasta.co.uk>
Subject Re: Struggling with a particular Map / Reduce
Date Tue, 17 Aug 2010 09:24:11 GMT
I think it would be better to use the View to split the titles and create a
list of Authors and Titles. A Map function such as

function(doc) {
  for (title in doc.titles)
      emit([doc.docAuthor, doc.titles[title]], null);
}

does just this.

You now have a list of keys in the form [Author, title] and they are sorted
by Author.

It's easy to then take these and produce a list of unique Author/title
combinations and a count of their frequency with the Reduce function.

function(keys, values, rereduce) {
  if (rereduce) {
    return sum(values);
  }
  else {
    return values.length;
  }
}

However it is difficult for reduce to produce a list of the top 3. Any
processing within the Reduce can only operate on the data passed in. It
doesn't know what data is yet to come. If you were to output only the top 3
entries passed in to a given invocation of the Reduce you would produce
inaccurate results as you would potentially throw away rows that might yet
accumulate into the all time top 3.

Once you have a list of unique Author/title pairs and their frequency you
can either sort and filter them within the client or within a list function

Hope this helps

Martin


On 17 August 2010 09:26, Ian Wootten <i.wootten@gmail.com> wrote:

> Hi Everyone,
>
> I was hoping somebody might be able to solve a problem I'm having
> attempting to implement a view at the moment.
>
> Essentially, what it does is to take a collection of documents which
> each have a single author and a list of names (which a possibly
> repeated). There may be multiple documents by the same author, with
> the same names within. Here's an example doc.
>
> doc.author
> doc.titles = ['sometitle', 'someothertitle', 'sometitle, 'anothertitle']
>
> I would like to return a list of the top 3 titles across for each
> author across all documents. I have tried and failed for several days
> to get this working correctly.
>
> So far, my map is as follows, giving the unique titles for a document,
> not ordered at all:
>
> function(doc) {
>
>  var unique_titles = [];
>
>  for(var i in doc.titles)
>  {
>     var count=0;
>
>       for(var j in unique_titles)
>       {
>         if(doc.titles[i]==unique_titles[j])
>         {
>            count++;
>         }
>       }
>
>       if(count==0)
>       {
>         unique_titles.push(doc.titles[i]);
>       }
>  }
>
>  for(var k=0; k<unique_titles.length;k++)
>  {
>    emit(doc.author, unique_titles[k]);
>  }
> }
>
> My map is as follows, this returns two unique titles from a single
> document when only a single document exists for an author(I think):
>
> function(keys, values, rereduce) {
>  return values.splice(0,2);
> }
>
> My problem is that:
>
> a) I can't return more than 2 items from the values array (if I set
> the splice length to 3, futon spits back a non-reducing error at me).
> b) Where multiple documents exist for the same author, in some
> instances I see a weird multi-dimensional array returned (rather than
> just two values). For example:
> [['sometitle','someothertitle'],['anothertitle'],['afurthertitle']]
>
> Presumably b) is the result of multiple documents for a single author
> interfering with one another, though I'm confused as to how I
> configure my map/reduce in order to get the information I'm after (I
> also wonder if its even possible).
>
> I've attempted to understand the documentation on reduce functions,
> taking a look at the many examples that exist too, but I'm unable to
> understand them well enough to solve my problem.
>
> I'd appreciate any help on this!
>
> Thanks,
>
> Ian
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message