mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From eric konsirald <>
Subject building a (weighted) movie similarity measure
Date Wed, 14 Sep 2011 16:29:29 GMT

i'm working on an experiment where i have a catalog of movies from IMDB
 containing all the metadata for each movie
(title/description/year/director/actors/etc...) and i would like to solve
the following problem:

INPUT: a movie title (or id in imdb)
OUTPUT: the most "similar" movies

but i have no user base or user activity, just the pure movie items.
so by "similar" i mean the movies having the most similar title and/or
description and/or director etc...

i'm not sure how to build the appropriate global similarity measure, as for
description i could e.g. try to build a term vectors containing the most
frequent words (using e.g. tf/idf) or using lda, but then i have no clue
other than intuition to attribute e.g. more weight to the similarity between
the description or the similarity between actors or e.g. the same year
(approximately) etc..

is anyone has to deal with a similar problem or have any insights of how to
approach it?
also, is mahout contains any tools that would help me to build such a
(weighted) similarity measure and most importantly allow me to experiment if
one similarity is better than another?

thanks a lot in advance for any insights


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message