mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kamesh <kamesh.had...@gmail.com>
Subject Re: Interpretation of cluster output
Date Mon, 16 Jun 2014 07:44:33 GMT
Thanks for the response Andrew. I am using Mahout 0.9 version. However, I
tried with trunk version but still I am getting output in the following
format

C-55{n=1 c=[15993058.000] r=[]}
C-56{n=2 c=[15993061.167] r=[]}
C-57{n=1 c=[15993062.000] r=[]}

C-97{n=1 c=[15993103.000] r=[]}
C-98{n=2 c=[15993119.333] r=[0.395]}
C-99{n=1 c=[15993105.000] r=[]}

and hence, not able to figure out the data points inside each cluster.

Also, When I am running with "-of JSON" getting NPE

Exception in thread "main" java.lang.NullPointerException
at
org.apache.mahout.utils.clustering.JsonClusterWriter.getTopFeaturesList(JsonClusterWriter.java:118)
at
org.apache.mahout.utils.clustering.JsonClusterWriter.write(JsonClusterWriter.java:73)
at
org.apache.mahout.utils.clustering.AbstractClusterWriter.write(AbstractClusterWriter.java:115)
at
org.apache.mahout.utils.clustering.AbstractClusterWriter.write(AbstractClusterWriter.java:102)

I am executing cluster dump using the following way

hadoop jar mahout-integration-1.0-SNAPSHOT.jar
org.apache.mahout.utils.clustering.ClusterDumper -i
/canopy/clusters-0-final -p /canopy/clusteredPoints -of JSON -n 1000

Also I have observed that the *part* file created inside *clusteredPoints*
is empty.

Please help me how to get data points from each cluster.


On Fri, Jun 13, 2014 at 9:24 PM, Andrew Musselman <
andrew.musselman@gmail.com> wrote:

> That's going to be easier if you can work off of trunk, since the output of
> clustering has been cleaned up to write a better format, per
> https://issues.apache.org/jira/browse/MAHOUT-1505
>
> E.g.,
>
> {
>   "top_terms": [
>     {"all":3.0149030685424805},
>     {"english":3.0149030685424805},
>     {"best":3.0149030685424805},
>     {"spaniel":3.0149030685424805},
>     {"springer":3.0149030685424805},
>     {"dogs":1.9162907600402832}
>   ],
>   "cluster_id": 7,
>   "cluster": {
>     "r": [],
>     "c": [
>       {"all":3.015},
>       {"best":3.015},
>       {"dogs":1.916},
>       {"english":3.015},
>       {"spaniel":3.015},
>       {"springer":3.015}
>     ],
>     "n": 1,
>     "identifier": "C-7"
>   },
>   "points": [
>     {
>       "point": [
>         {"all":3.015},
>         {"best":3.015},
>         {"dogs":1.916},
>         {"english":3.015},
>         {"spaniel":3.015},
>         {"springer":3.015}
>       ],
>       "vector_name": "P(14)",
>       "weight": "1.0"
>     }
>   ]
> }
>
>
> On Fri, Jun 13, 2014 at 2:42 AM, Kamesh <kamesh.hadoop@gmail.com> wrote:
>
> > Hi All,
> > Please help me in getting the data points inside each cluster.
> > The output of the clustering algorithm is center of the cluster and
> radius
> > of the cluster. How do we derive actual data points inside each cluster
> > from this output.
> >
> > --
> > Kamesh.
> >
>



-- 
Kamesh.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message