mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <...@occamsmachete.com>
Subject Re: can't get <point-id, cluster-id> thru "-p"
Date Wed, 14 Mar 2012 19:13:32 GMT
The -p parameter is an input. You should pass in the clusterPoints/ 
directory that was generated by the cluster driver you used.

My use of fkmeans might be an example:

    mahout fkmeans -i wikipedia-vectors/tfidf-vectors/ -c
    wikipedia-fkmeans-centroids -o wikipedia-fkmeans-clusters -k 100 -m
    2 -ow -x 10 -dm org.apache.mahout.common.distance.CosineDistanceMeasure

This will create 
wikipedia-clusters/clusters/clusteredPoints/part-m-00000 which is the 
file with the clustered points. I then did a clusterdump

    mahout clusterdump -s
    wikipedia-fkmeans-clusters/clusters/clusters-1/part-r-00000 -p
    wikipedia-fkmeans-clusters/clusteredPoints/ -d 
    wikipedia-fkmeans-clusters/dictionary.file-0 -dt sequencefile -dm
    org.apache.mahout.common.distance.CosineDistanceMeasure

This will output to the screen. Use -o to specify an output file.

Good advice for any user of mahout is read the output of the help very 
carefully. IMHO it is very easy to misunderstand the parameters, inputs, 
and outputs. I think I only understand about 10%. Try:

    mahout fkmeans --help


On 3/14/12 10:52 AM, Baoqiang Cao wrote:
> Hi,
>
> Very sorry for such a trivial question but ran out of luck. I'm trying
> to see which points (thru point-ids) belong to which cluster center.
> Here is what I did:
>
> mahout clusterdump -s /mahout/kmeans/clusters-15-final -d
> /mahout/sparse/dictionary.file-0 -dt sequencefile   -p /mahout/points
>> out
> The onscreen output is:
>
> 12/03/14 12:39:52 INFO common.AbstractJob: Command line arguments:
> {--dictionary=/mahout/sparse/dictionary.file-0,
> --dictionaryType=sequencefile,
> --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure,
> --endPhase=2147483647, --outputFormat=TEXT,
> --pointsDir=/mahout/points,
> --seqFileDir=/mahout/kmeans/clusters-15-final, --startPhase=0,
> --tempDir=temp}
> 12/03/14 12:39:55 WARN snappy.LoadSnappy: Snappy native library is available
> 12/03/14 12:39:55 INFO util.NativeCodeLoader: Loaded the native-hadoop library
> 12/03/14 12:39:55 INFO snappy.LoadSnappy: Snappy native library loaded
> 12/03/14 12:39:55 INFO compress.CodecPool: Got brand-new decompressor
> 12/03/14 12:39:55 INFO compress.CodecPool: Got brand-new decompressor
> 12/03/14 12:39:55 INFO compress.CodecPool: Got brand-new decompressor
> 12/03/14 12:39:55 INFO compress.CodecPool: Got brand-new decompressor
> 12/03/14 12:42:07 INFO clustering.ClusterDumper: Wrote 5188 clusters
> 12/03/14 12:42:07 INFO driver.MahoutDriver: Program took 135276 ms
> (Minutes: 2.2546)
>
>
> There is nothing under "/mahout/points". Any help on why and how?
>
> Thanks in advance.
> Baoqiang
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message