incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: on Reduce w/ Python view server
Date Tue, 20 Jan 2009 04:44:36 GMT
On Mon, Jan 19, 2009 at 6:15 PM, Jeff Hinrichs - DM&T
<dundeemt@gmail.com> wrote:
> On Mon, Jan 19, 2009 at 8:24 PM, Jeff Hinrichs - DM&T
> <dundeemt@gmail.com> wrote:
>>
>>
>> On Mon, Jan 19, 2009 at 6:25 PM, Paul Davis <paul.joseph.davis@gmail.com> wrote:
>>>
>>> On Mon, Jan 19, 2009 at 7:15 PM, Jeff Hinrichs - DM&T
>>> <jeffh@dundeemt.com> wrote:
>>> > Using couchdb-python 0.5 and couchdb 0.9.0a735191-incubating
>>> >
>>> > I am working on a "rosetta stone" for javascript and python views. (nothing
>>> > like rewriting in a different language to lean something<g>) I've
worked
>>> > though the simple map only views.  I then did a simple reduce(keys,vals)
>>> > which worked out fine.  But I am stumped on rereduce when the function
>>> > signature includes keys,vals,rereduce.  I've been trying to work through
the
>>> > js code from "Top N Tags" from the snippets page,
>>> > http://wiki.apache.org/couchdb/View_Snippets
>>> >
>>> > In particular, I am confused as to what is passed to the rereduce function
>>> > when rereduce=True.  The javascript is returning a complex structure instead
>>> > of a simple scalar, I see that it has to do with getting state back from
a
>>> > previous reduce operation, but I'm confused. The unpacking of the previous
>>> > results, I think has me befuddled ;(
>>> >
>>>
>>> The reduce function is always returning a structure of the form:
>>>
>>> {
>>>    "tag1": N1,
>>>    "tag2": N2,
>>>    ...
>>> }
>>>
>>> When you get to rereduce=true, then the values array is an array of
>>> the structures that your code needs to combine.
>>>
>>> And for berevity, the last bit of code outside the if statement is
>>> just discarding all tags below the top N so that the growth of data
>>> doesn't exceed the log(num_rows) rule.
>>>
>>> HTH,
>>> Paul Davis
>>>
>> That make sense from what I am reading.  thanks.
>> Any idea about the python view server from couchdb-python?  The values tuple being
returned works just fine however, the keys tuple, which appears to be a tuple of tuples is
causing my reduce function to fail silently whenever I try to access an element by index.
>>
>> keys looks like [[tag, object_id],[tag, object_id],...]
>>
>> However when I try to access keys[0] (should be [tag, object_id]) or keys[0][0] (should
be 'tag') my reduce script silently fails.  If I just ignore tags and sum the values in a
simple map/reduce I get the correct counts as simple vectors or atleast the same answer as
the javascript equivalent.  I'm hoping CMLenz will join the discussion or someone point me
in the proper direction.
>>
>> Regards,
>>
>> Jeff
>
> Ok, answering my own post -- you bang on your keyboard long enough and
> you just might keep up with the monkeys<g>
>
> I'm getting closer --
> def reduce(keys,vals,rereduce):
>  tags = {}
>  if not rereduce:
>    for i in range(len(keys)):
>      tags[keys[i][0]] = tags.get(keys[i][0],0) + vals[i]
>
>  else:
>    tags = vals[0]
>    for i in range(1,len(vals)):
>      return vals
>      #tags[val[i][0]] += tags.get(val[i][0],0) + val[i][1]
>
>  return tags
>

Never used the python view server, but I think what you're wanting is
something along the lines of:

def reduce(keys, vals, rereduce):
    N = 5 # N top tags to return
    tags = {}
    if(!rereduce):
        for k, v in zip(keys, vals):
            tags[k] += v
    else:
        tags = vals[0]
        for obj in vals[1:]:
            for k, v in obj.iteritems():
                tags[k] = tags.get(k, 0) + v

    keepers = list(tags.keys())
    keepers.sort(key=lambda x: tags.get(x), reverse=True)
    return dict([(k, tags[k]) for k in keepers[:N]);

> get's me partially reduced to:
> -Key-                         - Value-
> "cool"                        {cool: 49}
> "couchdb"                {couchdb: 46}
> "hopeful"                  {hopeful: 1}
> "hot"                          [{hot: 30}, {hot: 38}]
> "neat"                        {neat: 50}
> "python"                   [{python: 25}, {python: 26}]
>
> I've just go to do the proper thing with the lists of values for those
> tags like hot and python.  I'm off for tonight and I'll finish up in
> the morning.  I just didn't want someone wasting their time answering
> a question that I've already muddle my way through  -- map &
> map/reduce is a wiki topic but rereduce is a topic by itself ;)
>
> keeping-up-with-the-shakespearian-monkeys-mostly'ly
>
> Jeff
>

Mime
View raw message