mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: Clustering Demo
Date Thu, 08 May 2008 15:27:30 GMT
Grant Ingersoll wrote:
> Anyone have any sample code or demo of running the clustering over a 
> large collection of documents that they could share?  Mainly looking for 
> an example of taking some corpus, converting it into the appropriate 
> Mahout representation and then running either the k-means or the canopy 
> clustering on it.

It would be way cool to do this with the industry standard 20 newsgroups 
corpus - there have been many experiments and evaluations of this 
corpus, so it's good as a baseline.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Mime
View raw message