mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carlos Seminario <recsysu...@gmail.com>
Subject Re: Vectorize the movielens 100K dataset for Mahout k-means clustering
Date Mon, 01 Jul 2013 00:47:14 GMT
Ok. Has anyone already done this (per user) and are willing to share either
the vectors or the code? Thanks .. Carlos

On Sun, Jun 30, 2013 at 8:39 PM, Sebastian Schelter <ssc.open@googlemail.com
> wrote:

> Simply write a Java program, create the vectors per user or item (don't
> know how you want to cluster) and write them out via SequenceFileWriter.
>
> On 01.07.2013 02:29, Carlos Seminario wrote:
> > Hi: I want to vectorize the movielens 100K dataset as a
> > RandomAccessSparseVector and use it to run Mahout k-means clustering. Has
> > anyone done this before? If not, any ideas on a how this can be done?
> (BTW,
> > movielens dataset contains ~100K records/lines with this format: userid,
> > itemid, rating, unix time.)
> >
> > Thanks .. Carlos
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message