mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Quach <>
Subject Re: how to implement item-based recommender on movie genre data?
Date Thu, 10 May 2012 06:29:53 GMT
Well, actually, I wanted to represent each movie with a vector

[1, 0, 0, 1, 0]

Where each column represents an explicit genre, a 1 indicating that the movie has that genre
while a 0 indicates it is not (a crude representation, I'm sure)

I wanted to implement an item based recommender that uses these vectors to compute similarity
between items.

I think I figured it out, I could represent vector data as preferences where instead of user
ID's, it would be column indices. Then load that into a DataModel for use with the ItemSimilarity
object. The ItemBasedRecommender could load the DataModel with userID's while using this ItemSimilarity
object for calculating similarities.

This could possibly be a poor choice from an efficiency, accuracy, and machine learning standpoint,
I am not an expert on the subject at all.

On May 8, 2012, at 12:58 AM, Sean Owen wrote:

> So you have already decided, for each movie, whether it's in or not in each
> genre? And then you want to create a "profile" -- assuming you mean some
> kind of meta-genre?
> This isn't a recommender problem; it's just a clustering problem. I'd use
> the Tanimoto similarity.
> You could run the clustering-based recommender just to build the clusters.
> You wouldn't use it for recommendations.
> On Tue, May 8, 2012 at 8:53 AM, Daniel Quach <> wrote:
>> Suppose that I want to give each movie a profile based on the genres each
>> contains.
>> For naive and simplistic purposes, let's pretend that each movie has a
>> vector where each column is a genre, a 1 in that column indicates that the
>> movie contains that genre, 0 otherwise.
>> How would I feed such data into an Item-based Recommender? I want this
>> recommender to use these vectors for calculating similarity for
>> recommendations, which in turn is used for preference estimation (just as
>> described in section 4.4.1 of the Mahout in Action book)
>> The example in the book is not immediately clear to me. The sample code
>> does not mention the format of the data being used in creating the
>> ItemSimilarity object.

View raw message