mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toby Doig <toby.d...@gmail.com>
Subject clustering your data with dirichlet issue
Date Tue, 06 Apr 2010 04:42:15 GMT
I've run dirichlet commandline and now have an output folder with some
state-0, state-1, ... state-5 folders which each contain part-00000 and
.part-00000.crc files. However the  ClusteringYourData wiki page's
Retrieving the Output section just says TODO. I don't know how to turn those
part files into something useful.

    http://cwiki.apache.org/MAHOUT/clusteringyourdata.html

I successfully ran
the org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job test which
outputted data as text (to console at least) so I tried ripping the
printResults() methods from that class and putting them
in org.apache.mahout.clustering.dirichlet.DirichletJob but to no avail.

Can someone help?

Also, when running the commandline job it asks for the prototypeSize (-s
param) so when I converted my Lucene index to a vector file the output said
it created 11 vectors, but with i specified that value for prototypeSize the
job failed saying it found 1793 vectors. Changing the value i specify to
1793 works but i now wonder why i need to specify it if it can figure it
out? Could it not be optional?

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message