mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yosep Kim <yos...@gmail.com>
Subject Re: How to convert
Date Thu, 11 Aug 2011 23:10:52 GMT
Hello, Jeff:

I did run the commands again with parameters you wanted me to add.  However,
when I ran the following clusterdump command, I still had the same output:

   mahout clusterdump -s /user/hadoop/articles-kmeans/clusters-1 -d
/user/hadoop/articles-seqdir-sparse-kmeans/dictionary.file-0 -dt
sequencefile -b 100 -n 20

Am I missing some arguments?

Thanks again for your help, Jeff.

On Thu, Aug 11, 2011 at 6:49 PM, Yosep Kim <yosepk@gmail.com> wrote:

> What a fast response!!!  Thanks for the quick answer. I will let you know
> how it goes!  Thanks!
>
>
> On Thu, Aug 11, 2011 at 6:47 PM, Jeff Eastman <jeastman@narus.com> wrote:
>
>> You'll want to add the -nv option to seq2sparse to get NamedVectors out
>> and add the -cl argument to k-means to get the clustered documents. Then the
>> clusterdump should give you what you are seeking.
>>
>> -----Original Message-----
>> From: Yosep Kim [mailto:yosepk@gmail.com]
>> Sent: Thursday, August 11, 2011 3:43 PM
>> To: user@mahout.apache.org
>> Subject: How to convert
>>
>> Hello, Everyone!
>>
>> This is Yosep Kim, and I just started playing with Mahout.
>>  I successfully installed it on my box and got a example data clustered
>> using a K-Means clustering algorithm.  My input data was all text
>> documents
>> (i.e. new articles).  I ran a clusterdump command, I get some cool
>> information.  However, I was not able to find a way to translate this back
>> to the original document.  It looks like the algorithm created clusters
>> based on all the words inside of documents.  Did I understand this
>> correctly?  How can I create clusters based on documents so I can see that
>> "document1.txt and document2.txt are in Cluster 1"?  I'd appreciate your
>> help!!  Thanks.
>>
>>
>> :CL-16397{n=1032 c=[0:0.125, 0.5:0.019, 0.8m:0.014, 00:0.096, 0000:0.008,
>> 001:0.015, 00139:0.014, 001
>>        Top Terms:
>>                c                                       =>
>> 2.458502088406289
>>                software                                =>
>> 2.375095306671867
>>                java                                    =>
>>  2.2093305677868598
>>                project                                 =>
>> 1.989917316871096
>>                application                             =>
>> 1.957329582567363
>>                using                                   =>
>> 1.916300386652466
>>                web                                     =>
>>  1.9046723985856817
>>                development                             =>
>>  1.8707247066867443
>>
>> By the way, Mahout is way cool, and I can't wait to be part of this
>> "movement".
>>
>> Yosep
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message