mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From venkata ramana <venkat.ecosyst...@gmail.com>
Subject Re: Clusterdump in mahout
Date Fri, 27 Jun 2014 09:18:25 GMT
Hi,

I have not used reuters-21578 my k-means.

These steps I followed.

I have prepared sequence directory then seq2sparse directory.

./mahout kmeans -Dmapred.map.java.child.opts=-Xmx1g -i
/urlcat-data/56-categories/vector-dir/tfidf-vectors/ \
-c /urlcat-data/56-categories/cluster-centroids -o
/urlcat-data/56-categories/kmeans-cluster-output \
-ow -dm org.apache.mahout.common.distance.CosineDistanceMeasure -x 5 -ow
-cd 1 -k 49 --clustering -cl

mahout clusterdump -i /opt/49-classification/cluster-centroids -o
/opt/49-classification/kmeans-cluster-output/clusteranalyze1.txt -p
/opt/49-classification/kmeans-cluster-output/clusteredPoints/ -d
/root/Desktop/final_feature_dictionaries.txt -dt text -e;

I have checked examples/bin/cluster-reuters.sh and downloaded reuters-21578

Can you please let me know what should I do now.

Thanks,
Venkat


On Thu, Jun 26, 2014 at 6:46 PM, Suneel Marthi <smarthi@apache.org> wrote:

> No, a dictionary is not a file of 'crisp keywords' to clusters mapping. A
> dictionary is a mapping of keywords to a unique integerId.
>
> I again ask that it would be easier to help, if u can outline the steps u
> had done for generating the clusters. Seems like u might have missed
> something, at the very least look at the kmeans example in
> examples/bin/cluster-reuters.sh for the correct sequence of steps.
>
>
> On Thu, Jun 26, 2014 at 5:07 AM, venkata ramana <
> venkat.ecosystems@gmail.com
> > wrote:
>
> > As per my understanding dictionary file contains crisp keywords which are
> > related to cluster. Please let me know if I am wrong.
> >
> > Thanks,
> > Venkat
> >
> >
> > On Thu, Jun 26, 2014 at 1:27 PM, Suneel Marthi <smarthi@apache.org>
> wrote:
> >
> > > Its clear from the stacktrace that u have a String as key where an
> > integer
> > > was expected.
> > > How did u go about building ur clusters from original input ?
> > >
> > >
> > > On Thu, Jun 26, 2014 at 3:28 AM, venkata ramana <
> > > venkat.ecosystems@gmail.com
> > > > wrote:
> > >
> > > > Hi Mahout,
> > > >
> > > > I am trying to analysis  my k-means cluster. I have used following
> > > command.
> > > >
> > > > mahout clusterdump -i /opt/49-classification/cluster-centroids -o
> > > > /opt/49-classification/kmeans-cluster-output/clusteranalyze1.txt -p
> > > > /opt/49-classification/kmeans-cluster-output/clusteredPoints/ -d
> > > > /root/Desktop/final_feature_dictionaries.txt -dt text -e;
> > > >
> > > > I got the following error.
> > > >
> > > > hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin,
> running
> > > > locally
> > > > SLF4J: Class path contains multiple SLF4J bindings.
> > > > SLF4J: Found binding in
> > > >
> > > >
> > >
> >
> [jar:file:/opt/Gouri_Sankar/mahout-distribution-0.8/mahout-examples-0.8-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > > > SLF4J: Found binding in
> > > >
> > > >
> > >
> >
> [jar:file:/opt/Gouri_Sankar/mahout-distribution-0.8/lib/slf4j-jcl-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > > > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> > > > explanation.
> > > > SLF4J: Actual binding is of type [org.slf4j.impl.JCLLoggerFactory]
> > > > Jun 26, 2014 12:43:40 PM org.slf4j.impl.JCLLoggerAdapter info
> > > > INFO: Command line arguments:
> > > > {--dictionary=[/root/Desktop/final_feature_dictionaries.txt],
> > > > --dictionaryType=[text],
> > > >
> > > >
> > >
> >
> --distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure],
> > > > --endPhase=[2147483647], --evaluate=null,
> > > > --input=[/opt/49-classification/cluster-centroids],
> > > >
> > > >
> > >
> >
> --output=[/opt/49-classification/kmeans-cluster-output/clusteranalyze1.txt],
> > > > --outputFormat=[TEXT],
> > > >
> > > >
> > >
> >
> --pointsDir=[/opt/49-classification/kmeans-cluster-output/clusteredPoints/],
> > > > --startPhase=[0], --tempDir=[temp]}
> > > > Exception in thread "main" java.lang.NumberFormatException: For input
> > > > string: "aajproperty.com"
> > > >     at
> > > >
> > > >
> > >
> >
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> > > >     at java.lang.Integer.parseInt(Integer.java:492)
> > > >     at java.lang.Integer.parseInt(Integer.java:527)
> > > >     at
> > > >
> > > >
> > >
> >
> org.apache.mahout.utils.vectors.VectorHelper.loadTermDictionary(VectorHelper.java:218)
> > > >
> > > >
> > > > I have not used any numbers in my dictionary file. Could you please
> > help
> > > me
> > > > on this.
> > > >
> > > > Thanks,
> > > > Venkat
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message