mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sharath jagannath <sharathjagann...@gmail.com>
Subject Re: Clustering with KMeans
Date Tue, 08 Feb 2011 18:10:47 GMT
I am not using the command I have written a class extending Abstract Job and
 flow is as follows:

Convert the data to sequence files (all the mentioned data) using the codes
mentioned in the previous email -> Generate Vector using
SparseVectorFromSequenceFiles (Converted the main as a static method to use
this code from my class). -> Generate seed using Canopy -> Cluster using
KMeans.

Thanks for the response.

Cheers,
Sharath

On Tue, Feb 8, 2011 at 5:51 AM, Kate Ericson <movingb0x@gmail.com> wrote:

> Just to start from the top, can you show the command you've been using
> to start the kmeans job?
>
> -Kate
>
> On Mon, Feb 7, 2011 at 9:27 PM, sharath jagannath
> <sharathjagannath@gmail.com> wrote:
> > ok now I tried the digg data. Even now I am getting just one cluster
> (Digg
> > Data set: http://www.public.asu.edu/%7Emdechoud/datasets.html)
> >
> > My sample data for digg:
> >
> > 9275921    Ever heard about a movie and thought it sounded terrible, but
> > then when it came out it turned out to be pretty good, or even great Back
> in
> > August, fellow GeekDad Ken Denmead listed ten geeky movies that should've
> > been great, but weren't
> >
> > 9278984    Last week, an era came to a close in the NFL when Patriots
> safety
> > Rodney Harrison shredded his left calf muscle making an open-field tackle
> > against the Broncos. The injury ended his season and quite possibly his
> > stellar 15-year career. It also brought to a completion Harrison's long,
> > nasty reign as the NFL's dirtiest player. Now who's next?
> >
> > 9275737    Finally, the most prominent conservative in America has chosen
> > his pick for president, and it's liberal Democrat, Barack Obama
> >
> >
> > I am using everything that comes right out of the mahout's box. Only
> thing I
> > wrote was SequenceFromDigg:
> >
> >   for (String aFit : new FileLineIterable(current, charset, false)) {
> >
> >       StringBuilder file = new StringBuilder();
> >
> >       StringTokenizer tokenizer = new StringTokenizer(aFit, "\t");
> >
> >       int tokenCount = tokenizer.countTokens();
> >
> >       if (tokenCount < 2)
> >
> >             continue;
> >
> >       String content = (String) tokenizer.nextToken().toString();
> >
> >       while (tokenizer.hasMoreTokens()) {
> >
> >             String token = tokenizer.nextToken().toString();
> >
> >             file.append(content).append(" ").append(token);
> >
> >             file.append("\n");
> >
> >       }
> >
> > }
> >
> >
> > and SequenceFromDelicious:
> >
> >   for (String aFit : new FileLineIterable(current, charset, false)) {
> >
> > StringBuilder file = new StringBuilder();
> >
> > StringTokenizer tokenizer = new StringTokenizer(aFit, "\t");
> >
> > int tokenCount = tokenizer.countTokens();
> >
> > if (tokenCount < 2)
> >
> >  continue;
> >
> > --tokenCount;
> >
> > String content = (String) tokenizer.nextToken().toString();
> >
> > while (tokenizer.hasMoreTokens()) {
> >
> >  String token = tokenizer.nextToken().toString();
> >
> >  // Also consider the ranking, currently not handling it.
> >
> >  for (int i = 0; i < tokenCount; i++) {
> >
> >  file.append(token).append("\t");
> >
> >  }
> >
> >  --tokenCount;
> >
> >  file.append("\n");
> >
> > }
> >
> > writer.write(content, file.toString());
> >
> > }
> >
> >
> > Somebody, Please help :D
> >
> >
> > Thanks alot in advance.
> >
> >
> > Cheers,
> >
> > Sharath
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message