spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Georgios Samaras <georgesamaras...@gmail.com>
Subject Re: Is Spark's KMeans unable to handle bigdata?
Date Sat, 03 Sep 2016 17:31:58 GMT
Thank you very much Sean! If you would like, this could serve as an answer
in StackOverflow's question:
[Is Spark's kMeans unable to handle bigdata?](
http://stackoverflow.com/questions/39260820/is-sparks-kmeans-unable-to-handle-bigdata
).

Enjoy your weekend,
George

On Sat, Sep 3, 2016 at 1:22 AM, Sean Owen <sowen@cloudera.com> wrote:

> I opened https://issues.apache.org/jira/browse/SPARK-17389 to track
> some improvements, but by far the big one is that the init steps
> defaults to 5, when the paper says that 2 is pretty much optimal here.
> It's much faster with that setting.
>
> On Fri, Sep 2, 2016 at 6:45 PM, Georgios Samaras
> <georgesamarasdit@gmail.com> wrote:
> > I am not using the "runs" parameter anyway, but I see your point. If you
> > could point out any modifications in the minimal example I posted, I
> would
> > be more than interested to try them!
> >
>

Mime
View raw message