mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Reentering at the ground floor
Date Sat, 05 Mar 2011 22:56:08 GMT
Quickstart:
   https://cwiki.apache.org/confluence/display/MAHOUT/Quickstart

JIRA's with recent activity:
    https://issues.apache.org/jira/browse/MAHOUT-588
    https://issues.apache.org/jira/browse/MAHOUT-551
    https://issues.apache.org/jira/browse/MAHOUT-390

Chapters 6-12 of MiA (conflict of interest alert!)

Hashed vector encoding

https://hudson.apache.org/hudson/job/Mahout-Quality/javadoc/org/apache/mahout/vectorizer/encoders/package-summary.html

This won't be as good as you would like in terms of fit and finish.  All
contributions toward that end are VERY welcome.


On Sat, Mar 5, 2011 at 12:03 PM, Benson Margulies <bimargulies@gmail.com>wrote:

> I may have finally been handed a reason to make a serious attempt to
> use mahout, and here I am more or less where I tried to start a very
> long time ago.
>
> Imagine that someone else has gone and stuck a large number of text
> docs into a hadoop file system. I want to
>
> a- convert them to feature vectors
> b- run canopy+kmeans or some such clusterer
> c- report back the assignment of docs to clusters
>
> Where should I start reading in the web site?
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message