mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: How to determine which cluster an item belongs to
Date Mon, 09 Jan 2012 02:39:47 GMT
The ClusterDumper job can write the cluster data in various output
formats. CSV and GraphML (xml) formats are parseable, and include the
dictionary. I do not know how information about the cluster structures
are thrown away by these output formats.

The Gephi program reads GraphML and you can use it to visually explore
your clusters.

Lance

On Sat, Jan 7, 2012 at 12:57 PM, praneet mhatre <praneetmhatre@gmail.com> wrote:
> This seems to work perfectly. Thank you Sean!
>
> On Sat, Jan 7, 2012 at 12:36 PM, praneet mhatre <praneetmhatre@gmail.com>wrote:
>
>> Hi Sean,
>>
>> I tried passing the file too. But doing so gives me the following error:
>>
>>
>> SLF4J: Class path contains multiple SLF4J bindings.
>> SLF4J: Found binding in
>> [jar:file:/home/praneet/.m2/repository/org/slf4j/slf4j-log4j12/1.6.1/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in
>> [jar:file:/home/praneet/.m2/repository/org/slf4j/slf4j-jcl/1.6.1/slf4j-jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>> explanation.
>> 12/01/07 12:31:57 INFO dirichlet.DirichletDriver: Iteration 1
>> 12/01/07 12:31:57 INFO dirichlet.DirichletDriver: Iteration 2
>> 12/01/07 12:31:57 INFO dirichlet.DirichletDriver: Iteration 3
>> 12/01/07 12:31:58 INFO dirichlet.DirichletDriver: Iteration 4
>> 12/01/07 12:31:58 INFO dirichlet.DirichletDriver: Iteration 5
>> 12/01/07 12:31:58 INFO dirichlet.DirichletDriver: Iteration 6
>> 12/01/07 12:31:58 INFO dirichlet.DirichletDriver: Iteration 7
>> 12/01/07 12:31:58 INFO dirichlet.DirichletDriver: Iteration 8
>> 12/01/07 12:31:58 INFO dirichlet.DirichletDriver: Iteration 9
>> 12/01/07 12:31:58 INFO dirichlet.DirichletDriver: Iteration 10
>> java.lang.IllegalStateException:
>> file:/home/praneet/Eclipse-Output/output/clusters-10-final/clusters-10
>>     at
>> org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator$1.apply(SequenceFileDirValueIterator.java:82)
>>     at
>> org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator$1.apply(SequenceFileDirValueIterator.java:1)
>>     at com.google.common.collect.Iterators$8.next(Iterators.java:667)
>>     at com.google.common.collect.Iterators$5.hasNext(Iterators.java:475)
>>     at
>> com.google.common.collect.ForwardingIterator.hasNext(ForwardingIterator.java:39)
>>     at
>> org.apache.mahout.clustering.dirichlet.DirichletClusterMapper.loadClusters(DirichletClusterMapper.java:68)
>>     at
>> org.apache.mahout.clustering.dirichlet.DirichletDriver.clusterDataSeq(DirichletDriver.java:487)
>>     at
>> org.apache.mahout.clustering.dirichlet.DirichletDriver.clusterData(DirichletDriver.java:474)
>>     at
>> org.apache.mahout.clustering.dirichlet.DirichletDriver.run(DirichletDriver.java:172)
>>     at
>> org.apache.mahout.clustering.TestClusterDumper.testDirichlet2(TestClusterDumper.java:297)
>>     at org.apache.mahout.clustering.Test.main(Test.java:40)
>> Caused by: java.io.FileNotFoundException:
>> /home/praneet/Eclipse-Output/output/clusters-10-final/clusters-10 (Is a
>> directory)
>>
>>     at java.io.FileInputStream.open(Native Method)
>>     at java.io.FileInputStream.<init>(FileInputStream.java:137)
>>     at
>> org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.<init>(RawLocalFileSystem.java:70)
>>     at
>> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.<init>(RawLocalFileSystem.java:106)
>>     at
>> org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:176)
>>     at
>> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:126)
>>     at
>> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
>>     at
>> org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1437)
>>     at
>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
>>     at
>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
>>     at
>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
>>     at
>> org.apache.mahout.common.iterator.sequencefile.SequenceFileValueIterator.<init>(SequenceFileValueIterator.java:51)
>>     at
>> org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator$1.apply(SequenceFileDirValueIterator.java:78)
>>     ... 10 more
>>
>> This is what I get when I try
>>
>> Path path = new Path("/home/praneet/Eclipse-
>> Output/output/clusteredPoints/part-m-0");
>>
>> instead of
>>
>> Path path = new Path("/home/praneet/Eclipse-
>> Output/output/clusteredPoints");
>>
>> Since the directory has only one file part-m-0, I do not need to read the
>> whole directory. But I'll still try the approach you suggested and see how
>> things work out.
>>
>>
>>
>>
>> On Fri, Jan 6, 2012 at 9:09 PM, Sean Owen <srowen@gmail.com> wrote:
>>
>>> The error is right there:
>>>
>>> Exception in thread "main" java.io.FileNotFoundException:
>>> /home/praneet/Eclipse-Output/output/clusteredPoints (Is a directory)
>>>
>>> You are passing a directory, not a file.
>>> Look at the class SequenceFileDirIterable for an easy way to iterate
>>> over all files in a directory as key-value pairs.
>>>
>>> On Sat, Jan 7, 2012 at 3:01 AM, praneet mhatre <praneetmhatre@gmail.com>
>>> wrote:
>>> > Hi Abin and Petar,
>>> >
>>> > I tried the above approach with Dirichlet clustering. I am using the
>>> > following code snippet after clustering is completed.
>>> >
>>> >        Configuration conf = new Configuration();
>>> >        FileSystem fs = FileSystem.get(conf);
>>> >        Path path = new
>>> > Path("/home/praneet/Eclipse-Output/output/clusteredPoints");
>>> >
>>> >        SequenceFile.Reader reader = new
>>> SequenceFile.Reader(fs,path,conf);
>>> >        IntWritable key = new IntWritable();
>>> >        WeightedVectorWritable value = new WeightedVectorWritable();
>>> >        while(reader.next(key,value))
>>> >        {
>>> >         System.out.print(value.toString() +" is in cluster " +
>>> > key.toString() );
>>> >        }
>>> >        System.out.println();
>>> >
>>> > But I am getting the following error:
>>> >
>>> > SLF4J: Class path contains multiple SLF4J bindings.
>>> > SLF4J: Found binding in
>>> >
>>> [jar:file:/home/praneet/.m2/repository/org/slf4j/slf4j-log4j12/1.6.1/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> > SLF4J: Found binding in
>>> >
>>> [jar:file:/home/praneet/.m2/repository/org/slf4j/slf4j-jcl/1.6.1/slf4j-jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>> > explanation.
>>> > 12/01/06 18:47:45 INFO dirichlet.DirichletDriver: Iteration 1
>>> > 12/01/06 18:47:45 INFO dirichlet.DirichletDriver: Iteration 2
>>> > 12/01/06 18:47:45 INFO dirichlet.DirichletDriver: Iteration 3
>>> > 12/01/06 18:47:45 INFO dirichlet.DirichletDriver: Iteration 4
>>> > 12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 5
>>> > 12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 6
>>> > 12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 7
>>> > 12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 8
>>> > 12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 9
>>> > 12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 10
>>> > 12/01/06 18:47:47 INFO clustering.ClusterDumper: Wrote 10 clusters
>>> > Exception in thread "main" java.io.FileNotFoundException:
>>> > /home/praneet/Eclipse-Output/output/clusteredPoints (Is a directory)
>>> >    at java.io.FileInputStream.open(Native Method)
>>> >    at java.io.FileInputStream.<init>(FileInputStream.java:137)
>>> >    at
>>> >
>>> org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.<init>(RawLocalFileSystem.java:70)
>>> >    at
>>> >
>>> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.<init>(RawLocalFileSystem.java:106)
>>> >    at
>>> >
>>> org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:176)
>>> >    at
>>> >
>>> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:126)
>>> >    at
>>> >
>>> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
>>> >    at
>>> >
>>> org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1437)
>>> >    at
>>> > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
>>> >    at
>>> > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
>>> >    at
>>> > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
>>> >    at org.apache.mahout.clustering.Test.main(Test.java:46)
>>> >
>>> > Any suggestions?
>>> >
>>> > On Wed, Dec 28, 2011 at 12:25 AM, petar.mitrovic <
>>> petarmitrovic@gmail.com>wrote:
>>> >
>>> >> Hi Abin,
>>> >>
>>> >> Thank you very much! Your suggestion helped me a lot.
>>> >>
>>> >> First, I've set named vector parameter (-nv) to Mahout's vector
>>> generation
>>> >> process (seq2sparse) in order to write more descriptive vectors.
>>> >>
>>> >> Later, I could use something like this:
>>> >>
>>> >> IntWritable key= new IntWritable();
>>> >> WeightedVectorWritable vector = new WeightedVectorWritable();
>>> >> while (reader.next(key, vector)) {
>>> >>        NamedVector nv = (NamedVector) vector.getVector();
>>> >>        System.out.println(nv.getName() + " belongs to cluster "
+
>>> >> key.toString());
>>> >> }
>>> >>
>>> >> Hope this can be useful for someone else, too.
>>> >>
>>> >> Regards,
>>> >> Petar
>>> >>
>>> >> --
>>> >> View this message in context:
>>> >>
>>> http://lucene.472066.n3.nabble.com/How-to-determine-which-cluster-an-item-belongs-to-tp3613013p3615979.html
>>> >> Sent from the Mahout User List mailing list archive at Nabble.com.
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Praneet Mhatre
>>> > Graduate Student
>>> > Donald Bren School of ICS
>>> > University of California, Irvine
>>>
>>
>>
>>
>> --
>> Praneet Mhatre
>> Graduate Student
>> Donald Bren School of ICS
>> University of California, Irvine
>>
>>
>
>
> --
> Praneet Mhatre
> Graduate Student
> Donald Bren School of ICS
> University of California, Irvine



-- 
Lance Norskog
goksron@gmail.com

Mime
View raw message