mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: ClusteringUtils for Kmeans output
Date Sun, 09 Mar 2014 15:33:45 GMT
Can you file a JIRA and attach your patch?


On Sun, Mar 9, 2014 at 8:03 AM, Bikash Gupta <bikash.gupta11@gmail.com>wrote:

> Info for everyone
>
> I have successfully forced Mahout to build with Guava 11.0.2. Error and
> fixes as mentioned below
>
> 1.  Class: org.apache.mahout.math.stats.GroupTree
> - Change Line No 171 to - stack = new ArrayDeque<GroupTree>();
> - Import package java.util.ArrayDeque;
>
> 2. Class: org.apache.mahout.classifier.sgd.OnlineLogisticRegressionTest
> -  11.0.2 dosent have Closer in IO, hence I have used try-with-resources
> - changed java to 1.7
> - code changed as shown below
>
>  try(ByteArrayOutputStream byteArrayOutputStream = new
> ByteArrayOutputStream();
>         DataOutputStream dataOutputStream = new
> DataOutputStream(byteArrayOutputStream)) {
>       PolymorphicWritable.write(dataOutputStream, lr);
>       output = byteArrayOutputStream.toByteArray();
>     }
>
>     OnlineLogisticRegression read;
>
>     try(ByteArrayInputStream byteArrayInputStream = new
> ByteArrayInputStream(output);
>       DataInputStream dataInputStream = new
> DataInputStream(byteArrayInputStream)) {
>       read = PolymorphicWritable.read(dataInputStream,
> OnlineLogisticRegression.class);
>     }
>
> 3. org.apache.mahout.utils.vectors.lucene.LuceneIterableTest
> -  Iterators.advance was not present in 11.0.2. Hence just added the
> respective code. sample shown below
> int numberToAdvance = 1;
>     int iterateNumberToAdvance;
>     for (iterateNumberToAdvance = 0; iterateNumberToAdvance <
> numberToAdvance && iterator.hasNext(); iterateNumberToAdvance++) {
>       iterator.next();
>     }
>
> If anyone has good suggestion then please flag.
>
> @Suneel,
>
> Going back to my original question. I was able to call ClusteringUtils for
> Kmeans, however I cannot use ClusterQualitySummarizer bcoz it doesnt
> support WeightedPropertyVectorWritable.
>
>
>
> On Sun, Mar 9, 2014 at 6:28 PM, Bikash Gupta <bikash.gupta11@gmail.com
> >wrote:
>
> > Just FYI... downgrading guava to 11.0.2 has fixed the build error in
> > mahout-math as suggested by Ted however it is causing some other build
> > error in mahout-core
> >
> > [INFO] -------------------------------------------------------------
> > [ERROR]
> >
> /mahout-trunk/core/src/test/java/org/apache/mahout/classifier/sgd/OnlineLogisticRegressionTest.java:[24,28]
> > cannot find symbol
> >   symbol:   class Closer
> >   location: package com.google.common.io
> > [ERROR]
> >
> /mahout-trunk/core/src/test/java/org/apache/mahout/classifier/sgd/OnlineLogisticRegressionTest.java:[289,5]
> > cannot find symbol
> >   symbol:   class Closer
> >   location: class
> > org.apache.mahout.classifier.sgd.OnlineLogisticRegressionTest
> > [ERROR]
> >
> /mahout-trunk/core/src/test/java/org/apache/mahout/classifier/sgd/OnlineLogisticRegressionTest.java:[289,21]
> > cannot find symbol
> >   symbol:   variable Closer
> >   location: class
> > org.apache.mahout.classifier.sgd.OnlineLogisticRegressionTest
> >
> >
> > On Sun, Mar 9, 2014 at 3:45 PM, Suneel Marthi <suneel_marthi@yahoo.com
> >wrote:
> >
> >> Darn. U r the second guy to report that this week.  Change that line to
> >> what ted suggested.  The issue is with guava incompatibility with
> Hadoop's
> >> antiquated guava version.
> >>
> >> Sent from my iPhone
> >>
> >> On Mar 9, 2014, at 6:10 AM, Bikash Gupta <bikash.gupta11@gmail.com>
> >> wrote:
> >>
> >> I am successfully able to run ClusteringUtils on Kmeans(needs to check
> >> the scenario which you have mentionbed). However I am getting error from
> >> TDigest class
> >>
> >> Exception in thread "main" java.lang.NoSuchMethodError:
> >> com.google.common.collect.Queues.newArrayDeque()Ljava/util/ArrayDeque;
> >>     at
> org.apache.mahout.math.stats.GroupTree$1.<init>(GroupTree.java:171)
> >>     at
> org.apache.mahout.math.stats.GroupTree.iterator(GroupTree.java:169)
> >>     at
> >> org.apache.mahout.math.stats.GroupTree.access$300(GroupTree.java:14)
> >>     at
> >> org.apache.mahout.math.stats.GroupTree$2.iterator(GroupTree.java:317)
> >>     at org.apache.mahout.math.stats.TDigest.add(TDigest.java:105)
> >>     at org.apache.mahout.math.stats.TDigest.add(TDigest.java:88)
> >>     at org.apache.mahout.math.stats.TDigest.add(TDigest.java:76)
> >>     at
> >>
> org.apache.mahout.math.stats.OnlineSummarizer.add(OnlineSummarizer.java:57)
> >>     at
> >>
> org.apache.mahout.clustering.ClusteringUtils.summarizeClusterDistances(ClusteringUtils.java:65)
> >>
> >> Few days ago I saw a post where an user got a similar issue on TDigest
> >> class. Ted suggested to replace the line with below code
> >>
> >> stack = new ArrayDeque<GroupTree>();
> >>
> >> Let me know if I am correct.
> >>
> >>
> >> On Sun, Mar 9, 2014 at 3:18 PM, Suneel Marthi <suneel_marthi@yahoo.com
> >wrote:
> >>
> >>> U could call ClusterQualitySummarizer which then calls ClusteringUtils
> >>> to spew out the different metrics u had specified.
> >>> For an example, see the Streaming Kmeans section in
> >>> examples/bin/cluster-reuters.sh.
> >>>
> >>> It calls 'qualcluster' with options -i <tf-idf vectors generated from
> >>> seq2sparse> -c <output of Kmeans> -o <output file generated
with the
> >>> metrics>
> >>>
> >>>
> >>> I have not tried this on KMeans and since the output format of KMeans
> is
> >>> different from Streaming KMeans, this might just fall flat.
> >>> Also it may fail to read some of the clusters if the clusters have only
> >>> a single clusteredpoint, this is due to new TDigest summarizer that
> expects
> >>> atleast 2 points in order to calculate - max, quartiles, mean.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Sunday, March 9, 2014 4:19 AM, Bikash Gupta <
> bikash.gupta11@gmail.com>
> >>> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I want to use ClusteringUtils on Kmeans clusteredPoints to get
> >>> summarizeClusterDistances , daviesBouldinIndex & dunnIndex
> >>>
> >>> Is there any sample or example how to use these features?
> >>> --
> >>> Thanks & Regards
> >>> Bikash Kumar Gupta
> >>>
> >>
> >>
> >>
> >> --
> >> Thanks & Regards
> >> Bikash Kumar Gupta
> >>
> >>
> >
> >
> > --
> > Thanks & Regards
> > Bikash Kumar Gupta
> >
>
>
>
> --
> Thanks & Regards
> Bikash Kumar Gupta
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message