mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeffrey <mycyber...@yahoo.com>
Subject Re: Clustering (fkmeans) with Mahout using Clojure
Date Tue, 06 Sep 2011 07:53:24 GMT
Hi,

Took a break from this task and moved on with some other tasks in list. When I re-visit this
task again this morning I found some problem with sort utility and LC_COLLATE environment
variable that would make my sequenceFile generation script fail. Now I managed to get the
command line utility to generate the clusters

    $ bin/mahout fkmeans --input test/sensei --output test/clusters --clusters test/clusters/clusters-0
--clustering --overwrite --emitMostLikely false --numClusters 3 --maxIter 10 --m 5

However, when I run cluster dumper, I only see the three cluster center points, but not the
points although I included --clustering and --emitMostLikely options when I do the clustering

    $ ./bin/mahout clusterdump --seqFileDir test/clusters/clusters-1 --pointsDir test/clusters/clusteredPoints
--output sensei.txt


tested this with the latest revision of mahout-0.6-snapshot

When I try to do clustering with my clojure code (same as the one posted before), it is still
giving me the same error, any idea?

Regards,
Jeffrey04



>________________________________
>From: Jake Mannix <jake.mannix@gmail.com>
>To: user@mahout.apache.org; Jeffrey <mycyberpet@yahoo.com>
>Sent: Friday, August 26, 2011 1:23 AM
>Subject: Re: Clustering (fkmeans) with Mahout using Clojure
>
>
>
>
>
>On Thu, Aug 25, 2011 at 10:11 AM, Jeffrey <mycyberpet@yahoo.com> wrote:
>
>I am trying to write a short script to cluster my data via clojure (calling Mahout classes
though). I have my input data in this format (which is an output from a 
>>
>>
>
>
>This line you're instantiating a new SequentialAccessSparseVector, with the value of cardinality
being "count (vals photo_list)" - you need to have all of your Vectors exist with the same
cardinality (ie. they live in the same vector space, mathematically).  So you need to figure
out how big they need to be, and instantiate them *all* with this cardinality.
> 
>                                      (new SequentialAccessSparseVector
(count (vals photo_list)))
>>
>
>
>
>
>The error you are getting below:
>
>
>EDIT: apparently cardinality needs to be 1, need to figure out how to do it
>>
>
>
>is actually telling you that you're trying to say all vectors should be cardinality 1,
but it found some vectors with cardinality 10, so it threw an exception. 
> 
>  -jake
>
>
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message