mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Email and Collab. Filtering
Date Mon, 22 Aug 2011 14:48:28 GMT
I'm working on an example (well, examples) of using Mahout with the ASF Public Data Set up
on Amazon (http://aws.amazon.com/datasets/7791434387204566) and I wanted to show how to use
the 3 "C's" (collab filtering, clustering, classification) with the data set.  Clustering
and classification are pretty straight forward, but I'm wondering about the setup around collaborative
filtering.

The motivation for recommendations is pretty straightforward:  provide people recs on emails
that they might find useful based on what other people have interacted with.  The tricky part
is I am not totally sure on a valid setup of the problem.  My current thinking is that I build
up the rec. matrix based on whether someone has interacted with (initiated/replied) a thread
or not.  Thus, the columns are the thread ids and the rows are the users.  Each cell contains
the count of the number of times user X has interacted with thread Y.  This feels to me like
it is a stand in for that user's preference in that if they are replying multiple times, they
have an interest in that topic.  I have no idea if this will be effective or not, but it seems
like it could be interesting.  Does it sound reasonable?  I worry that even in a really large
data set as above it will simply be too sparse.

Is there a better way to think about this from a strict collaborative filtering context? 
In other words, I know I could do content-based recommendations but that is not what I am
after here.

-Grant

--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message