Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 70980 invoked from network); 12 Apr 2011 08:22:26 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 12 Apr 2011 08:22:26 -0000 Received: (qmail 30033 invoked by uid 500); 12 Apr 2011 08:22:24 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 29972 invoked by uid 500); 12 Apr 2011 08:22:23 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 29943 invoked by uid 99); 12 Apr 2011 08:22:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Apr 2011 08:22:22 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jeroentjevandijk@gmail.com designates 74.125.82.180 as permitted sender) Received: from [74.125.82.180] (HELO mail-wy0-f180.google.com) (74.125.82.180) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Apr 2011 08:22:14 +0000 Received: by wyj26 with SMTP id 26so8092242wyj.11 for ; Tue, 12 Apr 2011 01:21:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=z22hUtPd/wkGIrHXFHeRrcMvOZ/EmM4QxU8yaGBQFYw=; b=qPSCVNk88nMO9qclLsmp74GCiJH3NPlwc6PLbERtI3B2oG5hGLYCh/dV5EzfIHU4/1 InmtaHZDnSLPcd/6yGg1ytQQkDQMd54DSi8FZAjXOXKifPwS7nSjhGexxiMNhBSmcR+Q SizVwSwgZwvdufN4sAJOH8prGgoikSYWyNVsA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=SOJkp45459r0vL+YsGDPsKhuo0T+I+smSEHyeXxVsIsxlPi4ukk9zdC0Ssxf1lMxj9 qmdH5r/Df2kcd5Yvbh51aXfdz0CaZTv6aAoCfv/nfHnfsXPU3yr8zcgK5w0sR+actc6z IX0YL4COzHwwBTRuFhhvUX1DSemzTdPUClO9s= MIME-Version: 1.0 Received: by 10.216.25.202 with SMTP id z52mr3806063wez.14.1302596514243; Tue, 12 Apr 2011 01:21:54 -0700 (PDT) Received: by 10.216.169.67 with HTTP; Tue, 12 Apr 2011 01:21:54 -0700 (PDT) In-Reply-To: References: Date: Tue, 12 Apr 2011 10:21:54 +0200 Message-ID: Subject: Re: Help with an advanced view to build a recommender? From: Jeroen van Dijk To: user@couchdb.apache.org Content-Type: multipart/alternative; boundary=0016e6d64b91846db904a0b462a6 X-Virus-Checked: Checked by ClamAV on apache.org --0016e6d64b91846db904a0b462a6 Content-Type: text/plain; charset=ISO-8859-1 This feature (proposal) would be of great help http://wiki.apache.org/couchdb/Forward_document_references I believe. Anybody aware of progress on this? On Mon, Apr 11, 2011 at 10:10 PM, Jeroen van Dijk < jeroentjevandijk@gmail.com> wrote: > Hi all, > > The last couple of days I have been trying to build a view that would act > as a recommender. I have tried all the stuff that I could find/think of, but > I can't find a solution. I hope someone can tell me how I can do it or maybe > tell me that it is just not possible with one map reduce. Below is the > problem description, I hope it is clear enough. > > The basic idea is to use co-occurrences of apps attached to a user to > calculate the similarity between apps. This is how the two types of > documents; users and apps, look like: > > { _id: 'user-1', _type: 'user', app_ids: 'app-1', 'app-2' } > > { _id: 'app-1', _type: 'app', user_ids: 'user-1', 'user-2' } > > I was hoping the map reduce approach below would work when adding the > include_docs=true option. Unfortunately this doesn't work with a reduce > function. So the remaining problem so far seems to be to obtain the total > app count together with the co-occurrence counts. > > //map > function(doc) { > if(doc.type == "user") { > var app_count = doc.app_ids.length; > for(var i = 0; i < app_count; i++) { > for(var j = i + 1; j < app_count; j++) { > emit([doc.app_ids[j], doc.app_ids[i]], [0, 1, 0, {_id: > doc.app_ids[i]}]); > emit([doc.app_ids[i], doc.app_ids[j]], [0, 1, 0, {_id: > doc.app_ids[j]}]); > } > } > } > } > > //reduce > function(keys, values, rereduce) { > //output is [similarity, number of co-occurrences, total number, doc] > var output = [0, 0, 0, null]; > > values.forEach(function(pair) { > output[1] += pair[1]; > output[2] = pair[2].user_ids.length; > output[0] = output[1] / output[2]; > output[3] = pair[3]; > }); > > return output; > } > > > Hopefully someone has new insights that can help me a bit further. Thanks. > > Jeroen > --0016e6d64b91846db904a0b462a6--