incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark J. Reed" <markjr...@gmail.com>
Subject Re: Struggling with a particular Map / Reduce
Date Tue, 17 Aug 2010 17:45:52 GMT
Bear in mind that this approach doesn't distinguish between multiple
instances of a given title within a single doc vs multiple docs.  From
your original email I thought you wanted to collapse multiples of the
same title within one document but count separately multiples if they
come from different docs.

If that's the case, you'll still want to make the list unique before
emitting it... but a double loop isn't the way to go there; something
like this will work:

map: function(doc) {
  var titles={};
  for (var i=0; i<doc.titles.length; ++i) {
     titles[doc.titles[i]] = 1
  }
  for (var title in titles) {
     emit(doc.author, title)
  }
}


On Tue, Aug 17, 2010 at 8:17 AM, Ian Wootten <i.wootten@gmail.com> wrote:
> Thanks guys. I'd been working toward a solution with multiple level
> keys but had missed this approach for some reason. It's nice to know
> that at least some part of it has to be implemented in code.
>
> Not fully understanding what was being received by the reduce function
> and how it could be worked upon was the source of my problems.
>
> Anyway, I can get what I require from my view now, thanks for the help.
>
> On 17 August 2010 11:37, Robert Newson <robert.newson@gmail.com> wrote:
>> If you emit([doc.docAuthor, doc.titles[title]], 1) instead you could
>> use the built-in Erlang reduce function "_sum" instead, which is
>> faster.
>>
>> B.
>>
>> On Tue, Aug 17, 2010 at 10:24 AM, Martin Higham <martin@ocasta.co.uk> wrote:
>>> I think it would be better to use the View to split the titles and create a
>>> list of Authors and Titles. A Map function such as
>>>
>>> function(doc) {
>>>  for (title in doc.titles)
>>>      emit([doc.docAuthor, doc.titles[title]], null);
>>> }
>>>
>>> does just this.
>>>
>>> You now have a list of keys in the form [Author, title] and they are sorted
>>> by Author.
>>>
>>> It's easy to then take these and produce a list of unique Author/title
>>> combinations and a count of their frequency with the Reduce function.
>>>
>>> function(keys, values, rereduce) {
>>>  if (rereduce) {
>>>    return sum(values);
>>>  }
>>>  else {
>>>    return values.length;
>>>  }
>>> }
>>>
>>> However it is difficult for reduce to produce a list of the top 3. Any
>>> processing within the Reduce can only operate on the data passed in. It
>>> doesn't know what data is yet to come. If you were to output only the top 3
>>> entries passed in to a given invocation of the Reduce you would produce
>>> inaccurate results as you would potentially throw away rows that might yet
>>> accumulate into the all time top 3.
>>>
>>> Once you have a list of unique Author/title pairs and their frequency you
>>> can either sort and filter them within the client or within a list function
>>>
>>> Hope this helps
>>>
>>> Martin
>>>
>>>
>>> On 17 August 2010 09:26, Ian Wootten <i.wootten@gmail.com> wrote:
>>>
>>>> Hi Everyone,
>>>>
>>>> I was hoping somebody might be able to solve a problem I'm having
>>>> attempting to implement a view at the moment.
>>>>
>>>> Essentially, what it does is to take a collection of documents which
>>>> each have a single author and a list of names (which a possibly
>>>> repeated). There may be multiple documents by the same author, with
>>>> the same names within. Here's an example doc.
>>>>
>>>> doc.author
>>>> doc.titles = ['sometitle', 'someothertitle', 'sometitle, 'anothertitle']
>>>>
>>>> I would like to return a list of the top 3 titles across for each
>>>> author across all documents. I have tried and failed for several days
>>>> to get this working correctly.
>>>>
>>>> So far, my map is as follows, giving the unique titles for a document,
>>>> not ordered at all:
>>>>
>>>> function(doc) {
>>>>
>>>>  var unique_titles = [];
>>>>
>>>>  for(var i in doc.titles)
>>>>  {
>>>>     var count=0;
>>>>
>>>>       for(var j in unique_titles)
>>>>       {
>>>>         if(doc.titles[i]==unique_titles[j])
>>>>         {
>>>>            count++;
>>>>         }
>>>>       }
>>>>
>>>>       if(count==0)
>>>>       {
>>>>         unique_titles.push(doc.titles[i]);
>>>>       }
>>>>  }
>>>>
>>>>  for(var k=0; k<unique_titles.length;k++)
>>>>  {
>>>>    emit(doc.author, unique_titles[k]);
>>>>  }
>>>> }
>>>>
>>>> My map is as follows, this returns two unique titles from a single
>>>> document when only a single document exists for an author(I think):
>>>>
>>>> function(keys, values, rereduce) {
>>>>  return values.splice(0,2);
>>>> }
>>>>
>>>> My problem is that:
>>>>
>>>> a) I can't return more than 2 items from the values array (if I set
>>>> the splice length to 3, futon spits back a non-reducing error at me).
>>>> b) Where multiple documents exist for the same author, in some
>>>> instances I see a weird multi-dimensional array returned (rather than
>>>> just two values). For example:
>>>> [['sometitle','someothertitle'],['anothertitle'],['afurthertitle']]
>>>>
>>>> Presumably b) is the result of multiple documents for a single author
>>>> interfering with one another, though I'm confused as to how I
>>>> configure my map/reduce in order to get the information I'm after (I
>>>> also wonder if its even possible).
>>>>
>>>> I've attempted to understand the documentation on reduce functions,
>>>> taking a look at the many examples that exist too, but I'm unable to
>>>> understand them well enough to solve my problem.
>>>>
>>>> I'd appreciate any help on this!
>>>>
>>>> Thanks,
>>>>
>>>> Ian
>>>>
>>>
>>
>



-- 
Mark J. Reed <markjreed@gmail.com>

Mime
View raw message