spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fernando Velasco <fermaat...@gmail.com>
Subject k-prototypes in MLLib?
Date Mon, 19 Oct 2015 15:38:04 GMT
Hi everyone!

I am a data scientist new to Spark and I am interested on clustering of
mixed variables. I am more used to R, where there are implementations like
Daysy, PAM, etc. It is true that dummy variables along with K-Means can
perform a nice job on clustering mixed variables, but I find this is not a
completely correct treatment for the categorical ones. So, my question is
if there is any K-modes/k-prototypes implementation planned to be included
in MLlib in the future.

I have been able to find this
https://issues.apache.org/jira/browse/SPARK-4510 but it seems PAM is not
completely scalable. Perhaps K-prototypes could fit better.

Regards,

Mime
View raw message