mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aleksander Stensby <>
Subject Some basic introductory questions
Date Thu, 17 Sep 2009 07:36:50 GMT
Hi all,
I've been following the development of Mahout for quite a while now and
figured it was time for me to get my hands dirty:)

I've gone through the examples and Grant's excellent IBM article (great work
on that Grant!).
So, now I'm at the point where I want to figure out where I go next.
Specifically, I'm a bit fuzzed about common practices when it comes to
utilizing Mahout in my own applications...

Case scenario:
I have my own project, add the dependencies to Mahout (through maven), and
make my own little kMeans test class.
I guess my question is a bit stupid, but how would you go about using Mahout
out of the box?

Ideally (or maybe not?), I figured that I could just take care of providing
the Vectors -> push it into mahout and run the kMeans clustering...
But when I started looking at the kMeans clustering example, I notice that
there is actually a lot of implementation in the example itself... Is it
really necessary for me to implement all of those methods in every project
where I want to do kMeans? Can't they be reused? The methods I talk about
are for instance:
  static List<Canopy> populateCanopies(DistanceMeasure measure, List<Vector>
points, double t1, double t2)
  private static void referenceKmeans(List<Vector> points,
List<List<Cluster>> clusters, DistanceMeasure measure, int maxIter)
  private static boolean iterateReference(List<Vector> points, List<Cluster>
clusters, DistanceMeasure measure)

In my narrow minded head I would think that input would be the List<Vector>
and that the output would be List<List<Cluster> of some general kMeans
method that did all the internals for me... Or am I missing something? Or do
I have to use the KMeansDriver.runJob and read input from serialized vectors

Appreciate any guidance here guys :)


Aleksander M. Stensby
Lead Software Developer and System Architect
Integrasco A/S

Please consider the environment before printing all or any of this e-mail

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message