flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Bunk <stefan.b...@googlemail.com>
Subject Re: Distribute DataSet to subset of nodes
Date Mon, 14 Sep 2015 17:16:33 GMT
Hi,

actually, I am distributing my data before the program starts, without
using broadcast sets.

However, the approach should still work, under one condition:

> DataSet mapped1 =
> data.flatMap(yourMap).withBroadcastSet(smallData1,"data").setParallelism(5);
> DataSet mapped2 =
> data.flatMap(yourMap).withBroadcastSet(smallData2,"data").setParallelism(5);
>
Is it guaranteed, that this selects a disjoint set of nodes, i.e. five
nodes for mapped1 and five other nodes for mapped2?

Is there any way of selecting the five nodes concretely? Currently, I have
stored the first half of the data on nodes 1-5 and the second half on nodes
6-10. With this approach, I guess, nodes are selected randomly so I would
have to copy both halves to all of the nodes.

Best,
Stefan

Mime
View raw message