mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sharath jagannath <sharathjagann...@gmail.com>
Subject Re: Clustering with KMeans
Date Tue, 08 Feb 2011 04:27:36 GMT
ok now I tried the digg data. Even now I am getting just one cluster (Digg
Data set: http://www.public.asu.edu/%7Emdechoud/datasets.html)

My sample data for digg:

9275921    Ever heard about a movie and thought it sounded terrible, but
then when it came out it turned out to be pretty good, or even great Back in
August, fellow GeekDad Ken Denmead listed ten geeky movies that should've
been great, but weren't

9278984    Last week, an era came to a close in the NFL when Patriots safety
Rodney Harrison shredded his left calf muscle making an open-field tackle
against the Broncos. The injury ended his season and quite possibly his
stellar 15-year career. It also brought to a completion Harrison's long,
nasty reign as the NFL's dirtiest player. Now who's next?

9275737    Finally, the most prominent conservative in America has chosen
his pick for president, and it's liberal Democrat, Barack Obama


I am using everything that comes right out of the mahout's box. Only thing I
wrote was SequenceFromDigg:

   for (String aFit : new FileLineIterable(current, charset, false)) {

       StringBuilder file = new StringBuilder();

       StringTokenizer tokenizer = new StringTokenizer(aFit, "\t");

       int tokenCount = tokenizer.countTokens();

       if (tokenCount < 2)

             continue;

       String content = (String) tokenizer.nextToken().toString();

       while (tokenizer.hasMoreTokens()) {

             String token = tokenizer.nextToken().toString();

             file.append(content).append(" ").append(token);

             file.append("\n");

       }

}


and SequenceFromDelicious:

   for (String aFit : new FileLineIterable(current, charset, false)) {

StringBuilder file = new StringBuilder();

StringTokenizer tokenizer = new StringTokenizer(aFit, "\t");

int tokenCount = tokenizer.countTokens();

if (tokenCount < 2)

 continue;

--tokenCount;

String content = (String) tokenizer.nextToken().toString();

while (tokenizer.hasMoreTokens()) {

 String token = tokenizer.nextToken().toString();

 // Also consider the ranking, currently not handling it.

 for (int i = 0; i < tokenCount; i++) {

 file.append(token).append("\t");

 }

 --tokenCount;

 file.append("\n");

}

writer.write(content, file.toString());

}


Somebody, Please help :D


Thanks alot in advance.


Cheers,

Sharath

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message