mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Musselman <andrew.mussel...@gmail.com>
Subject Re: Interpretation of cluster output
Date Fri, 13 Jun 2014 15:54:39 GMT
That's going to be easier if you can work off of trunk, since the output of
clustering has been cleaned up to write a better format, per
https://issues.apache.org/jira/browse/MAHOUT-1505

E.g.,

{
  "top_terms": [
    {"all":3.0149030685424805},
    {"english":3.0149030685424805},
    {"best":3.0149030685424805},
    {"spaniel":3.0149030685424805},
    {"springer":3.0149030685424805},
    {"dogs":1.9162907600402832}
  ],
  "cluster_id": 7,
  "cluster": {
    "r": [],
    "c": [
      {"all":3.015},
      {"best":3.015},
      {"dogs":1.916},
      {"english":3.015},
      {"spaniel":3.015},
      {"springer":3.015}
    ],
    "n": 1,
    "identifier": "C-7"
  },
  "points": [
    {
      "point": [
        {"all":3.015},
        {"best":3.015},
        {"dogs":1.916},
        {"english":3.015},
        {"spaniel":3.015},
        {"springer":3.015}
      ],
      "vector_name": "P(14)",
      "weight": "1.0"
    }
  ]
}


On Fri, Jun 13, 2014 at 2:42 AM, Kamesh <kamesh.hadoop@gmail.com> wrote:

> Hi All,
> Please help me in getting the data points inside each cluster.
> The output of the clustering algorithm is center of the cluster and radius
> of the cluster. How do we derive actual data points inside each cluster
> from this output.
>
> --
> Kamesh.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message