mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Matrix-based recommendation analysis
Date Tue, 23 Nov 2010 07:24:58 GMT
The GroupLens dataset has User, Item, Rating and Timestamp.
We will use the rating of 1-5 as-is, but will reduce the timestamp
field to day of the week.
The lack of a rating defaults two 3 (neutral). There are 5 ratings
total in the sample:

U1, I1, 2, ?
U1, I3, 4, ?
U2, I1, 4, ?
U2, I2, 5, T
U2, I3, 3, ?

(We'll get to the question marks later.)
Now, make two matrices, User v.s. Item and Item v.s. Day of the Week.
User v.s. Item contains ratings, and Item v.s. Day of the Week
contains the number of rating records for that item on that day of the
week: ratings only cover Sunday, Monday and Tuesday.

Formatting tables in kerned fonts just plain doesn't work, thus the
alternate format.

2 Users v.s. 3 Items:
I1,I2,I3
{
U1  {2,3,4}
U2  {4,5,3}
 }

3 Items v.s. 7 Days of the Week
S,M,T,W,T,F,S
{
I1 {1,0,1,0,0,0,0}
I2 {0,0,1,0,0,0,0}
I3 {0,1,1,0,0,0,0}
}

Now, multiply these two matrices. The product is 2 Users v.s. 7 Days
of the Week:
S,M,T,W,T,F,S
{
U1 {2,4,9,0,0,0,0}
U2 {4,3,12,0,0,0,0}
}

This matrix carries the total amount of enthusiasm for each user on
each day. To get the average enthusiasm of each user, divide each row
by the total number of ratings per day:
S,M,T,W,T,F,S
{
U1 {2,4,3,0,0,0,0}
U2 {4,3,4,0,0,0,0}
}

Did I get this right, Ted?

BTW, where are your slides for this topic? I've seen them a couple of
times in presentations (live and on Fora.tv), but can't find them.

-- 
Lance Norskog
lance.norskog@gmail.com

Mime
View raw message