mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kate Ericson <moving...@gmail.com>
Subject Re: Clustering with KMeans
Date Tue, 08 Feb 2011 13:51:16 GMT
Just to start from the top, can you show the command you've been using
to start the kmeans job?

-Kate

On Mon, Feb 7, 2011 at 9:27 PM, sharath jagannath
<sharathjagannath@gmail.com> wrote:
> ok now I tried the digg data. Even now I am getting just one cluster (Digg
> Data set: http://www.public.asu.edu/%7Emdechoud/datasets.html)
>
> My sample data for digg:
>
> 9275921    Ever heard about a movie and thought it sounded terrible, but
> then when it came out it turned out to be pretty good, or even great Back in
> August, fellow GeekDad Ken Denmead listed ten geeky movies that should've
> been great, but weren't
>
> 9278984    Last week, an era came to a close in the NFL when Patriots safety
> Rodney Harrison shredded his left calf muscle making an open-field tackle
> against the Broncos. The injury ended his season and quite possibly his
> stellar 15-year career. It also brought to a completion Harrison's long,
> nasty reign as the NFL's dirtiest player. Now who's next?
>
> 9275737    Finally, the most prominent conservative in America has chosen
> his pick for president, and it's liberal Democrat, Barack Obama
>
>
> I am using everything that comes right out of the mahout's box. Only thing I
> wrote was SequenceFromDigg:
>
>   for (String aFit : new FileLineIterable(current, charset, false)) {
>
>       StringBuilder file = new StringBuilder();
>
>       StringTokenizer tokenizer = new StringTokenizer(aFit, "\t");
>
>       int tokenCount = tokenizer.countTokens();
>
>       if (tokenCount < 2)
>
>             continue;
>
>       String content = (String) tokenizer.nextToken().toString();
>
>       while (tokenizer.hasMoreTokens()) {
>
>             String token = tokenizer.nextToken().toString();
>
>             file.append(content).append(" ").append(token);
>
>             file.append("\n");
>
>       }
>
> }
>
>
> and SequenceFromDelicious:
>
>   for (String aFit : new FileLineIterable(current, charset, false)) {
>
> StringBuilder file = new StringBuilder();
>
> StringTokenizer tokenizer = new StringTokenizer(aFit, "\t");
>
> int tokenCount = tokenizer.countTokens();
>
> if (tokenCount < 2)
>
>  continue;
>
> --tokenCount;
>
> String content = (String) tokenizer.nextToken().toString();
>
> while (tokenizer.hasMoreTokens()) {
>
>  String token = tokenizer.nextToken().toString();
>
>  // Also consider the ranking, currently not handling it.
>
>  for (int i = 0; i < tokenCount; i++) {
>
>  file.append(token).append("\t");
>
>  }
>
>  --tokenCount;
>
>  file.append("\n");
>
> }
>
> writer.write(content, file.toString());
>
> }
>
>
> Somebody, Please help :D
>
>
> Thanks alot in advance.
>
>
> Cheers,
>
> Sharath
>

Mime
View raw message