Return-Path: Delivered-To: apmail-mahout-user-archive@www.apache.org Received: (qmail 47091 invoked from network); 11 May 2010 21:08:56 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 11 May 2010 21:08:56 -0000 Received: (qmail 2977 invoked by uid 500); 11 May 2010 21:08:56 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 2940 invoked by uid 500); 11 May 2010 21:08:56 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 2932 invoked by uid 99); 11 May 2010 21:08:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 May 2010 21:08:56 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of srowen@gmail.com designates 72.14.220.155 as permitted sender) Received: from [72.14.220.155] (HELO fg-out-1718.google.com) (72.14.220.155) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 May 2010 21:08:48 +0000 Received: by fg-out-1718.google.com with SMTP id 19so178001fgg.7 for ; Tue, 11 May 2010 14:08:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=vOLK7Ya9Lt65j40gtqLdYC+sJTZrDRBPh+x4ur37is4=; b=FX7gvNmBG7BPa8hTzQMWAMhIzzSVGt/KNv3nGnK+qT7UdKDEM5EfAxWt92KkRG9umJ kpfWp1p7SlO9owiyPY4auR1FafZhMJOweQBcJKouIRWAmleKla2CIbrX+HDGlEK/+rKM EiucOdGgOv9QnTRKS2CHcaZ5Et7KF5ss3kw3U= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=hdDp2Kw7wrwvpCh0ghF+cuQ76DSkww9KHoQ5FXjHBWBQr6MzC3WOFqVy0ARImwNNEF H44IeQ+YBRua5HD7V2NAFe1qvyU5v3LNAuAD+ukx9juXhpkMPaiJKxvhF/eTyK13GJpI 4csPGxjCz1RqUQR07eAtjmsJYtnFl731BTmvE= MIME-Version: 1.0 Received: by 10.87.29.33 with SMTP id g33mr13024950fgj.27.1273612107989; Tue, 11 May 2010 14:08:27 -0700 (PDT) Received: by 10.86.99.9 with HTTP; Tue, 11 May 2010 14:08:27 -0700 (PDT) In-Reply-To: <588792.62432.qm@web112017.mail.gq1.yahoo.com> References: <588792.62432.qm@web112017.mail.gq1.yahoo.com> Date: Tue, 11 May 2010 22:08:27 +0100 Message-ID: Subject: Re: RecommenderJob output From: Sean Owen To: user@mahout.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Can you update it while it's running? Not really. It's a multi-phase batch job and I don't think you could meaningfully change it on the fly. Do you need to run the whole thing every time? No, not at all. Phase 1 (item IDs to item indices) doesn't need to run every time, nor does phase 3 (count co-occurrence). It's OK if these are a little out of date. Phase 2 is user vector generation; while I didn't write any ability to simply append a new user vector to its output, it's easy to write. So you don't have to run that every time. Phase 4 and 5 are really where the recommendation happens. Those go together. You can limit which users it processes though with a file of user IDs, --usersFile. I'd say the core job is nearing maturity -- think it's tuned and debugged. But these kind of practical hooks, like being able to incrementally update aspects of the pipeline, are exactly what's needed next. I'd welcome your input and patches in this regard. Sean On Tue, May 11, 2010 at 10:00 PM, First Qaxy wrote: > One question on the recommendation lifecycle: once a RecommendationJob is= being run with the intermediate/temp model being created what is the proce= ss of maintaining it? Can I update it or parts of it to reflect new data? > For example if I have a new user or new preferences for an existing user = that I want to compute recommendation for can I do that by incrementally up= date the internal model and regenerate only recommendations for the user th= at I'm interested in? > > Thanks. > -qf > --- On Tue, 5/11/10, Sean Owen wrote: > > From: Sean Owen > Subject: Re: RecommenderJob output > To: user@mahout.apache.org > Cc: mahout-user@lucene.apache.org > Received: Tuesday, May 11, 2010, 3:55 AM > > I just committed more of my local changes, since I'm actively > improving and fixing things here. > > My output looks more reasonable: > > 101=C2=A0 =C2=A0=C2=A0=C2=A0[1015:4.0,1021:3.0,1020:3.0] > 102=C2=A0 =C2=A0=C2=A0=C2=A0[1004:10.0,1005:8.0,1021:2.0,1020:2.0,1015:2.= 0] > 103=C2=A0 =C2=A0=C2=A0=C2=A0[1005:12.0,1021:3.0,1015:3.0,1020:3.0] > 105=C2=A0 =C2=A0=C2=A0=C2=A0[1005:14.0,1021:3.0,1020:3.0] > 106=C2=A0 =C2=A0=C2=A0=C2=A0[1005:12.0,1021:4.0,1015:3.0] > > So you might just try the code from head. booleanData doesn't really > affect the output, it just enables optimizations for this case. > > >