couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Wootten <i.woot...@gmail.com>
Subject Re: Struggling with a particular Map / Reduce
Date Wed, 18 Aug 2010 12:12:07 GMT
Thanks Mark.

I've actually ended up using a combination of Martin, Robert's and
your approach in the end.

Seems to be giving me everything I need now.

On 17 August 2010 18:45, Mark J. Reed <markjreed@gmail.com> wrote:
> Bear in mind that this approach doesn't distinguish between multiple
> instances of a given title within a single doc vs multiple docs.  From
> your original email I thought you wanted to collapse multiples of the
> same title within one document but count separately multiples if they
> come from different docs.
>
> If that's the case, you'll still want to make the list unique before
> emitting it... but a double loop isn't the way to go there; something
> like this will work:
>
> map: function(doc) {
>  var titles={};
>  for (var i=0; i<doc.titles.length; ++i) {
>     titles[doc.titles[i]] = 1
>  }
>  for (var title in titles) {
>     emit(doc.author, title)
>  }
> }
>
>
> On Tue, Aug 17, 2010 at 8:17 AM, Ian Wootten <i.wootten@gmail.com> wrote:
>> Thanks guys. I'd been working toward a solution with multiple level
>> keys but had missed this approach for some reason. It's nice to know
>> that at least some part of it has to be implemented in code.
>>
>> Not fully understanding what was being received by the reduce function
>> and how it could be worked upon was the source of my problems.
>>
>> Anyway, I can get what I require from my view now, thanks for the help.
>>
>> On 17 August 2010 11:37, Robert Newson <robert.newson@gmail.com> wrote:
>>> If you emit([doc.docAuthor, doc.titles[title]], 1) instead you could
>>> use the built-in Erlang reduce function "_sum" instead, which is
>>> faster.
>>>
>>> B.
>>>
>>> On Tue, Aug 17, 2010 at 10:24 AM, Martin Higham <martin@ocasta.co.uk> wrote:
>>>> I think it would be better to use the View to split the titles and create
a
>>>> list of Authors and Titles. A Map function such as
>>>>
>>>> function(doc) {
>>>>  for (title in doc.titles)
>>>>      emit([doc.docAuthor, doc.titles[title]], null);
>>>> }
>>>>
>>>> does just this.
>>>>
>>>> You now have a list of keys in the form [Author, title] and they are sorted
>>>> by Author.
>>>>
>>>> It's easy to then take these and produce a list of unique Author/title
>>>> combinations and a count of their frequency with the Reduce function.
>>>>
>>>> function(keys, values, rereduce) {
>>>>  if (rereduce) {
>>>>    return sum(values);
>>>>  }
>>>>  else {
>>>>    return values.length;
>>>>  }
>>>> }
>>>>
>>>> However it is difficult for reduce to produce a list of the top 3. Any
>>>> processing within the Reduce can only operate on the data passed in. It
>>>> doesn't know what data is yet to come. If you were to output only the top
3
>>>> entries passed in to a given invocation of the Reduce you would produce
>>>> inaccurate results as you would potentially throw away rows that might yet
>>>> accumulate into the all time top 3.
>>>>
>>>> Once you have a list of unique Author/title pairs and their frequency you
>>>> can either sort and filter them within the client or within a list function
>>>>
>>>> Hope this helps
>>>>
>>>> Martin
>>>>
>>>>
>>>> On 17 August 2010 09:26, Ian Wootten <i.wootten@gmail.com> wrote:
>>>>
>>>>> Hi Everyone,
>>>>>
>>>>> I was hoping somebody might be able to solve a problem I'm having
>>>>> attempting to implement a view at the moment.
>>>>>
>>>>> Essentially, what it does is to take a collection of documents which
>>>>> each have a single author and a list of names (which a possibly
>>>>> repeated). There may be multiple documents by the same author, with
>>>>> the same names within. Here's an example doc.
>>>>>
>>>>> doc.author
>>>>> doc.titles = ['sometitle', 'someothertitle', 'sometitle, 'anothertitle']
>>>>>
>>>>> I would like to return a list of the top 3 titles across for each
>>>>> author across all documents. I have tried and failed for several days
>>>>> to get this working correctly.
>>>>>
>>>>> So far, my map is as follows, giving the unique titles for a document,
>>>>> not ordered at all:
>>>>>
>>>>> function(doc) {
>>>>>
>>>>>  var unique_titles = [];
>>>>>
>>>>>  for(var i in doc.titles)
>>>>>  {
>>>>>     var count=0;
>>>>>
>>>>>       for(var j in unique_titles)
>>>>>       {
>>>>>         if(doc.titles[i]==unique_titles[j])
>>>>>         {
>>>>>            count++;
>>>>>         }
>>>>>       }
>>>>>
>>>>>       if(count==0)
>>>>>       {
>>>>>         unique_titles.push(doc.titles[i]);
>>>>>       }
>>>>>  }
>>>>>
>>>>>  for(var k=0; k<unique_titles.length;k++)
>>>>>  {
>>>>>    emit(doc.author, unique_titles[k]);
>>>>>  }
>>>>> }
>>>>>
>>>>> My map is as follows, this returns two unique titles from a single
>>>>> document when only a single document exists for an author(I think):
>>>>>
>>>>> function(keys, values, rereduce) {
>>>>>  return values.splice(0,2);
>>>>> }
>>>>>
>>>>> My problem is that:
>>>>>
>>>>> a) I can't return more than 2 items from the values array (if I set
>>>>> the splice length to 3, futon spits back a non-reducing error at me).
>>>>> b) Where multiple documents exist for the same author, in some
>>>>> instances I see a weird multi-dimensional array returned (rather than
>>>>> just two values). For example:
>>>>> [['sometitle','someothertitle'],['anothertitle'],['afurthertitle']]
>>>>>
>>>>> Presumably b) is the result of multiple documents for a single author
>>>>> interfering with one another, though I'm confused as to how I
>>>>> configure my map/reduce in order to get the information I'm after (I
>>>>> also wonder if its even possible).
>>>>>
>>>>> I've attempted to understand the documentation on reduce functions,
>>>>> taking a look at the many examples that exist too, but I'm unable to
>>>>> understand them well enough to solve my problem.
>>>>>
>>>>> I'd appreciate any help on this!
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Ian
>>>>>
>>>>
>>>
>>
>
>
>
> --
> Mark J. Reed <markjreed@gmail.com>
>

Mime
View raw message