flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: multiple k-means in parallel
Date Mon, 28 Nov 2016 08:37:36 GMT
Hi Lydia,

that is certainly possible, however you need to adapt the algorithm a bit.
The straight-forward approach would be to replicate the input data and
assign IDs for each k-means run.
If you have a data point (1, 2, 3) you could replicate it to three data
points (10, 1, 2, 3), (15, 1, 2, 3), (20, 1, 2, 3) where the first field
identifies the number of centers of a run.
>From there you need a bit of custom partitioning and composite keys to
shuffle the data to the right workers.

Hope that helps,
Fabian

2016-11-27 11:48 GMT+01:00 Lydia Ickler <icklerly@googlemail.com>:

> Hi,
>
> I want to run k-means with different k in parallel.
> So each worker should calculate its own k-means. Is that possible?
>
> If I do a map on a list of integers to then apply k-means I get the
> following error:
> Task not serializable
>
> I am looking forward to your answers!
> Lydia

Mime
View raw message