Hi all, I've been following the development of Mahout for quite a while now and figured it was time for me to get my hands dirty:) I've gone through the examples and Grant's excellent IBM article (great work on that Grant!). So, now I'm at the point where I want to figure out where I go next. Specifically, I'm a bit fuzzed about common practices when it comes to utilizing Mahout in my own applications... Case scenario: I have my own project, add the dependencies to Mahout (through maven), and make my own little kMeans test class. I guess my question is a bit stupid, but how would you go about using Mahout out of the box? Ideally (or maybe not?), I figured that I could just take care of providing the Vectors -> push it into mahout and run the kMeans clustering... But when I started looking at the kMeans clustering example, I notice that there is actually a lot of implementation in the example itself... Is it really necessary for me to implement all of those methods in every project where I want to do kMeans? Can't they be reused? The methods I talk about are for instance: static List populateCanopies(DistanceMeasure measure, List points, double t1, double t2) private static void referenceKmeans(List points, List> clusters, DistanceMeasure measure, int maxIter) private static boolean iterateReference(List points, List clusters, DistanceMeasure measure) In my narrow minded head I would think that input would be the List and that the output would be List of some general kMeans method that did all the internals for me... Or am I missing something? Or do I have to use the KMeansDriver.runJob and read input from serialized vectors files? Appreciate any guidance here guys :) Cheers, Aleksander -- Aleksander M. Stensby Lead Software Developer and System Architect Integrasco A/S www.integrasco.com http://twitter.com/Integrasco http://facebook.com/Integrasco Please consider the environment before printing all or any of this e-mail