spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marco Gaido (JIRA)" <>
Subject [jira] [Commented] (SPARK-22440) Add Calinski-Harabasz index to ClusteringEvaluator
Date Fri, 03 Nov 2017 19:37:00 GMT


Marco Gaido commented on SPARK-22440:

Honestly I don't know what people are using for clustering evaluation and I don't know either
where to retrive such a statistic. My goal here was to make easier for people to migrate their
existing workloads to Spark. Since sklearn is surely one of the most widespread libraries
for machine learning, the existing workloads can evaluate an unsupervised clustering through
Silhouette or Calinski-Harabasz. If we support both, I think the adoption of Spark would be
easier for them.

> Add Calinski-Harabasz index to ClusteringEvaluator
> --------------------------------------------------
>                 Key: SPARK-22440
>                 URL:
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>    Affects Versions: 2.3.0
>            Reporter: Marco Gaido
>            Priority: Minor
> In SPARK-14516 we introduced ClusteringEvaluator with an implementation of Silhouette.
> sklearn contains also another metric for the evaluation of unsupervised clustering results.
The metric is Calinski-Harabasz. This JIRA is to add it to Spark.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message