mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark <mar...@sisa.samsung.com>
Subject Re: Graphical Mahout Cluster Visualization Tools?
Date Fri, 04 Nov 2011 01:29:15 GMT
Grant Ingersoll <gsingers <at> apache.org> writes:
> 
> I've tried various open source tools (Gephi, others), but haven't found one
yet that can handle large
> volumes of points in an efficient way.  FWIW, the Carrot2 workbench is BSD,
perhaps it could be used with
> some work?  
> 
> That being said, I did recently add the ability to ClusterDumper to output in
CSV or GraphML as well as make it
> pluggable so one can output whatever format you wish.
> 
Grant,

That was quick! Unfortunately, I don't have a lot of experience with these
tools, but my current tool chain: solr->mahout lucene.vector->mahout canopy->
mahout kmeans -> mahout clusterdump is reporting that exactly one cluster got
written. However, the csv file created using clusterdump is zero length.

I'm also seeing no variation in results despite changing kmeans distance
measurers and canopy t1 and t2 parameters. Am currently running Dirichlet on
my 8k document data set in an attempt to understand the structure of my data
better. 

Any advice?

Thanks,

Mark



Mime
View raw message