mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Madhusudan Joshi <madhusudanrjo...@gmail.com>
Subject Re: Check the input files present in cluster
Date Wed, 06 Apr 2011 03:53:28 GMT
The command I used to cluster dump is

mahout clusterdump -s mytest/kmeans/clusters-1 -p
mytest/kmeans/clusteredPoints -d mytest/seqdir-sparse/dictionary.file-0 -dt
sequencefile -n 20 -o Desktop/ClusterDump/Kmeans/cl1.txt

I tried the reuters example and then clustered using my sample files. The
output of my sample files is

CL-0{n=2 c=[article:3.009, first:3.279, third:3.279] r=[first:3.279,
third:3.279]}
    Top Terms:
        third                                   =>  3.2787654399871826
        first                                   =>  3.2787654399871826
        article                                 =>  3.0087521076202393
    Weight:  Point:
    1.0: [article:3.009, first:6.558]
    1.0: [article:3.009, third:6.558]
VL-1{n=1 c=[article:3.009, second:6.558] r=[article:0.000, first:0.000,
fourth:0.000, second:0.000, third:0.000]}
    Top Terms:
        second                                  =>   6.557530879974365
        article                                 =>  3.0087521076202393
    Weight:  Point:
    1.0: [article:3.009, second:6.558]
VL-3{n=1 c=[article:3.009, fourth:6.558] r=[article:0.000, first:0.000,
fourth:0.000, second:0.000, third:0.000]}
    Top Terms:
        fourth                                  =>   6.557530879974365
        article                                 =>  3.0087521076202393
    Weight:  Point:
    1.0: [article:3.009, fourth:6.558]

The output showed the number of documents present in the cluster but did not
mention which documents. I need to be able to check which documents are
present in any given clusters.

On Tue, Apr 5, 2011 at 11:34 PM, Jeff Eastman <jeastman@narus.com> wrote:

> You are going to have to be much more explicit in terms of what command
> line invocations you did and what results you got in order for anybody to be
> able help you much here. Have you tried the clustering examples in the wiki?
>
> -----Original Message-----
> From: Madhusudan Joshi [mailto:madhusudanrjoshi@gmail.com]
> Sent: Monday, April 04, 2011 10:23 PM
> To: user@mahout.apache.org
> Subject: Check the input files present in cluster
>
> Hi,
>
> I am new to mahout and trying out clustering. I created a cluster using
> kmeans in bash. I want to know which files are present in a given clusters.
> I tried looking for it in cluster dumper but didn't find the required
> solution. Can anyone help me with this?
>
> Thanks.
>
> --
> Everything we hear is an opinion, not a fact.
> Everything we see is perspective, not the truth.
>



-- 
Everything we hear is an opinion, not a fact.
Everything we see is perspective, not the truth.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message