mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Clustering boolean vectors
Date Tue, 10 May 2011 15:29:39 GMT
(Reposting my reply to the original copy of the message.)
GroupLens doesn't *require* a rating per se -- you are free to ignore
it if you want!

Boolean data is all 1, in Mahout. There are no 0 ratings. If you just
mean that the non-existent preferences are "0", OK. But having two
ratings, 0 and 1, along with the possibility of not existing, is three
states, not two.

You can easily have a DataModel, if you have the GroupLens data.
Convert it to CSV, or just use the GroupLensDataModel in examples/.

But, to really answer your question: first you should define what you
are trying to do. Then we can help decide how to do it. I don't know
if you need clustering or not so far.

On Tue, May 10, 2011 at 3:34 PM, Steven Bourke <sbourke@gmail.com> wrote:
>
> Are you trying to find similar items or recommend movies? If you are using
> the cluster approach you will just find movies with similar genres so the
> recommendation aspect of the work will only return recommended clusters of
> movies back to the user.
>
> On Tue, May 10, 2011 at 3:08 PM, mail2abin <mail2abin@gmail.com> wrote:
>
> > Hi,
> >
> >
> > I was trying to run ItemBasedRecommender on GroupLens movie sample data,
> > which requires the rating ( user preferences inp). But suppose I do not
> > have
> > the rating ( user prefereces) , rather I have an
> > Item boolean attribute vector. [ like God father - 0|1|0|0|0|0|1 ] , where
> > the two 1's may say Crime, Drama.
> >
> > ItemBasedRecommender requires a DataModel, which I do not have. Instead I
> > think I should use some Clustering techniques based on the Item boolean
> > attribute vector, as I understand and later get items which belongs to the
> > cluster.
> >
> > Please give pointers to the right Clustering API ( though I have see
> > TanimotoCluster etc.), not sure if they are good for boolean vectors.
> >
> > -----
> > Abin Varghese
> > Software Engineer
> > NY
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/Clustering-boolean-vectors-tp2923176p2923176.html
> > Sent from the Mahout User List mailing list archive at Nabble.com.
> >

Mime
View raw message