mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeffrey <mycyber...@yahoo.com>
Subject Re: Clustering (fkmeans) with Mahout using Clojure
Date Wed, 07 Sep 2011 06:40:02 GMT
I am suspecting that the sequenceFile is not written properly, because the command line cluster
dumper would not return points in the dump file :/ is there any way to verify this?

Best wishes,
Jeffrey04



>________________________________
>From: Jeffrey <mycyberpet@yahoo.com>
>To: "user@mahout.apache.org" <user@mahout.apache.org>
>Sent: Tuesday, September 6, 2011 6:02 PM
>Subject: Re: Clustering (fkmeans) with Mahout using Clojure
>
>
>When I switch the runSequential option to true, as shown in the following script
>
>
>    #!./bin/clj
>
>
>    (ns sensei.clustering.fkmeans)
>
>
>    (import org.apache.hadoop.conf.Configuration)
>    (import org.apache.hadoop.fs.Path)
>
>
>    (import org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansDriver)
>    (import org.apache.mahout.common.distance.EuclideanDistanceMeasure)
>    (import org.apache.mahout.clustering.kmeans.RandomSeedGenerator)
>
>
>    (let [hadoop_configuration ((fn []
>                                    (let [conf (new Configuration)]
>                                      (. conf set "fs.default.name"
"hdfs://localhost:9000/")
>                                      conf)))
>          input_path (new Path "test/sensei")
>          output_path (new Path "test/clusters")
>          clusters_in_path (new Path "test/clusters/cluster-0")]
>      (FuzzyKMeansDriver/run
>        hadoop_configuration
>        input_path
>        (RandomSeedGenerator/buildRandom
>          hadoop_configuration
>          input_path
>          clusters_in_path
>          (int 2)
>          (new EuclideanDistanceMeasure))
>        output_path
>        (new EuclideanDistanceMeasure)
>        (double 0.5)
>        (int 10)
>        (float 5.0)
>        true
>        false
>        (double 0.0)
>        true))
>
>
>I get this output
>
>
>    SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
>    SLF4J: Defaulting to no-operation (NOP) logger implementation
>    SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
>    11/09/06 17:58:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library
for your platform... using builtin-java classes where applicable
>    11/09/06 17:58:59 INFO compress.CodecPool: Got brand-new compressor
>    11/09/06 17:58:59 INFO compress.CodecPool: Got brand-new decompressor
>    Exception in thread "main" java.lang.IllegalStateException: Clusters is empty!
>            at org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansDriver.buildClustersSeq(FuzzyKMeansDriver.java:361)
>            at org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansDriver.buildClusters(FuzzyKMeansDriver.java:343)
>            at org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansDriver.run(FuzzyKMeansDriver.java:295)
>            at sensei.clustering.fkmeans$eval17.invoke(fkmeans.clj:35)
>            at clojure.lang.Compiler.eval(Compiler.java:6406)
>            at clojure.lang.Compiler.load(Compiler.java:6843)
>            at clojure.lang.Compiler.loadFile(Compiler.java:6804)
>            at clojure.main$load_script.invoke(main.clj:282)
>            at clojure.main$script_opt.invoke(main.clj:342)
>            at clojure.main$main.doInvoke(main.clj:426)
>            at clojure.lang.RestFn.invoke(RestFn.java:436)
>            at clojure.lang.Var.invoke(Var.java:409)
>            at clojure.lang.AFn.applyToHelper(AFn.java:167)
>            at clojure.lang.Var.applyTo(Var.java:518)
>            at clojure.main.main(main.java:37)
>
>
>Am I generating the initial cluster wrong?
>
>
>Rewritten the script to use FuzzyKMeansDriver.run(String[] args) but still fails with
the same error as the original program (the output is the same as the initial output, it's
kinda long so I am not pasting it again here).
>
>
>    #!./bin/clj
>
>
>    (ns sensei.clustering.fkmeans)
>
>
>    (import org.apache.hadoop.conf.Configuration)
>    (import org.apache.hadoop.fs.Path)
>
>
>    (import org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansDriver)
>    (import org.apache.mahout.common.distance.EuclideanDistanceMeasure)
>    (import org.apache.mahout.clustering.kmeans.RandomSeedGenerator)
>
>
>    (let [hadoop_configuration ((fn []
>                                    (let [conf (new Configuration)]
>                                      (. conf set "fs.default.name"
"hdfs://localhost:9000/")
>                                      conf)))
>          driver (new FuzzyKMeansDriver)]
>      (. driver setConf hadoop_configuration)
>      (. driver
>         run
>         (into-array String ["--input" "test/sensei"
>                             "--output" "test/clusters"
>                             "--clusters" "test/clusters/clusters-0"
>                             "--clustering"
>                             "--overwrite"
>                             "--emitMostLikely" "false"
>                             "--numClusters" "3"
>                             "--maxIter" "10"
>                             "--m" "5"])))
>
>
>Best wishes,
>Jeffrey04
>
>>________________________________
>>From: Jeffrey <mycyberpet@yahoo.com>
>>To: Jake Mannix <jake.mannix@gmail.com>; "user@mahout.apache.org" <user@mahout.apache.org>
>>Sent: Tuesday, September 6, 2011 3:53 PM
>>Subject: Re: Clustering (fkmeans) with Mahout using Clojure
>>
>>
>>Hi,
>>
>>
>>Took a break from this task and moved on with some other tasks in list. When I re-visit
this task again this morning I found some problem with sort utility and LC_COLLATE environment
variable that would make my sequenceFile generation script fail. Now I managed to get the
command line utility to generate the clusters
>>
>>
>>    $ bin/mahout fkmeans --input test/sensei --output test/clusters --clusters
test/clusters/clusters-0 --clustering --overwrite --emitMostLikely false --numClusters 3 --maxIter
10 --m 5
>>
>>
>>However, when I run cluster dumper, I only see the three cluster center points, but
not the points although I included --clustering and --emitMostLikely options when I do the
clustering
>>
>>
>>    $ ./bin/mahout clusterdump --seqFileDir test/clusters/clusters-1 --pointsDir
test/clusters/clusteredPoints --output sensei.txt
>>
>>
>>
>>tested this with the latest revision of mahout-0.6-snapshot
>>
>>
>>When I try to do clustering with my clojure code (same as the one posted before),
it is still giving me the same error, any idea?
>>
>>
>>Regards,
>>Jeffrey04
>>
>>
>>
>>>________________________________
>>>From: Jake Mannix <jake.mannix@gmail.com>
>>>To: user@mahout.apache.org; Jeffrey <mycyberpet@yahoo.com>
>>>Sent: Friday, August 26, 2011 1:23 AM
>>>Subject: Re: Clustering (fkmeans) with Mahout using Clojure
>>>
>>>
>>>
>>>
>>>
>>>On Thu, Aug 25, 2011 at 10:11 AM, Jeffrey <mycyberpet@yahoo.com> wrote:
>>>
>>>I am trying to write a short script to cluster my data via clojure (calling Mahout
classes though). I have my input data in this format (which is an output from a 
>>>>
>>>>
>>>
>>>
>>>This line you're instantiating a new SequentialAccessSparseVector, with the value
of cardinality being "count (vals photo_list)" - you need to have all of your Vectors exist
with the same cardinality (ie. they live in the same vector space, mathematically).  So you
need to figure out how big they need to be, and instantiate them *all* with this cardinality.
>>> 
>>>                                      (new SequentialAccessSparseVector
(count (vals photo_list)))
>>>>
>>>
>>>
>>>
>>>
>>>The error you are getting below:
>>>
>>>
>>>EDIT: apparently cardinality needs to be 1, need to figure out how to do it
>>>>
>>>
>>>
>>>is actually telling you that you're trying to say all vectors should be cardinality
1, but it found some vectors with cardinality 10, so it threw an exception. 
>>> 
>>>  -jake
>>>
>>>
>>
>>
>
>
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message