Return-Path: Delivered-To: apmail-mahout-user-archive@www.apache.org Received: (qmail 17567 invoked from network); 23 Nov 2010 07:50:05 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 23 Nov 2010 07:50:05 -0000 Received: (qmail 81592 invoked by uid 500); 23 Nov 2010 07:50:36 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 81562 invoked by uid 500); 23 Nov 2010 07:50:36 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 81552 invoked by uid 99); 23 Nov 2010 07:50:35 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Nov 2010 07:50:35 +0000 X-ASF-Spam-Status: No, hits=2.5 required=10.0 tests=FREEMAIL_FROM,FREEMAIL_REPLY,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of srowen@gmail.com designates 209.85.214.170 as permitted sender) Received: from [209.85.214.170] (HELO mail-iw0-f170.google.com) (209.85.214.170) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Nov 2010 07:50:30 +0000 Received: by iwn41 with SMTP id 41so2034713iwn.1 for ; Mon, 22 Nov 2010 23:50:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=BK8KzGiudQ8G+QmIYzljkfFqsrMcV4eClpsPYEWvpbU=; b=ApiSvAig7SefgYC6ZBBRCUy7+J1k270bLXMe3n0rZNK9fgPxRpptxyvh7Odw8S72kX UtFX61K2VjtdKz4h9xg2b26y4XhXwIHvMYkjAvrHnXILpc7Z4K3Qv2VZCgSKfzzSy3WH 2XUogLtcW9/ZG23b8+tARDBGtj/ysp09z49yQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=kTsvHJlslEljOKixLKA1HiDuGyPWj4p49pgdaX+M2i3BFQ7N/mtI7GRZz2b6cXg8uz nWkDWPpEBNmHUSBFMSYsaKuneX2hoFVdqLtyWXftKMRFf8MvhbJhijy5jDa7gMBhIfti L3ZirHBIWAT/DoWdVqVcst5pwIolIkfdB9wi4= MIME-Version: 1.0 Received: by 10.231.182.131 with SMTP id cc3mr8070954ibb.36.1290498609401; Mon, 22 Nov 2010 23:50:09 -0800 (PST) Received: by 10.231.158.200 with HTTP; Mon, 22 Nov 2010 23:50:09 -0800 (PST) In-Reply-To: References: Date: Tue, 23 Nov 2010 07:50:09 +0000 Message-ID: Subject: Re: Matrix-based recommendation analysis From: Sean Owen To: user@mahout.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable (PS I don't think that link from Ted is publicly visible but try http://www.slideshare.net/tdunning ) Maybe I'm walking into half of a another conversation but what's the question or goal here? I don't think the matrix product contains quite what you're saying. For example U1 records only 2 ratings but has some "enthusiasm" on 3 separate days in the matrix product. The product is mashing together item-day associations from all users and applying them to each user. Conceptually user-item-day is the 3-dimensional matrix that it sounds like, if you want to distinguish associations from different users to different items on different days. On Tue, Nov 23, 2010 at 7:24 AM, Lance Norskog wrote: > The GroupLens dataset has User, Item, Rating and Timestamp. > We will use the rating of 1-5 as-is, but will reduce the timestamp > field to day of the week. > The lack of a rating defaults two 3 (neutral). There are 5 ratings > total in the sample: > > U1, I1, 2, ? > U1, I3, 4, ? > U2, I1, 4, ? > U2, I2, 5, T > U2, I3, 3, ? > > (We'll get to the question marks later.) > Now, make two matrices, User v.s. Item and Item v.s. Day of the Week. > User v.s. Item contains ratings, and Item v.s. Day of the Week > contains the number of rating records for that item on that day of the > week: ratings only cover Sunday, Monday and Tuesday. > > Formatting tables in kerned fonts just plain doesn't work, thus the > alternate format. > > 2 Users v.s. 3 Items: > I1,I2,I3 > { > U1 =C2=A0{2,3,4} > U2 =C2=A0{4,5,3} > =C2=A0} > > 3 Items v.s. 7 Days of the Week > S,M,T,W,T,F,S > { > I1 {1,0,1,0,0,0,0} > I2 {0,0,1,0,0,0,0} > I3 {0,1,1,0,0,0,0} > } > > Now, multiply these two matrices. The product is 2 Users v.s. 7 Days > of the Week: > S,M,T,W,T,F,S > { > U1 {2,4,9,0,0,0,0} > U2 {4,3,12,0,0,0,0} > } > > This matrix carries the total amount of enthusiasm for each user on > each day. To get the average enthusiasm of each user, divide each row > by the total number of ratings per day: > S,M,T,W,T,F,S > { > U1 {2,4,3,0,0,0,0} > U2 {4,3,4,0,0,0,0} > } > > Did I get this right, Ted? > > BTW, where are your slides for this topic? I've seen them a couple of > times in presentations (live and on Fora.tv), but can't find them. > > -- > Lance Norskog > lance.norskog@gmail.com >