flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@apache.org>
Subject Re: Union of multiple datasets vs Join
Date Mon, 22 Dec 2014 11:46:46 GMT
Follow the first approach.
Joins are expensive, union comes for free.

Best, Fabian

2014-12-22 11:47 GMT+01:00 Flavio Pompermaier <pompermaier@okkam.it>:

> Hi guys,
>
> In my use case I have multiple Datasets with the same structure (e.g.
> Tuple3) and I want to produce an output Dataset containing all Tuple3
> grouped by the first field (0).
> I can obtain the same results performing a union of all datasets and then
> a group by (simplest implementation) or join all of them pairwise
> (((A->B)->C)->D)..) or I don't know if there is any other solution. When
> should I use the first or the second approach? Could you help me in
> figuring out the internals of the two approaches? I always have some fear
> when using multiple joins when I don't know exactly their size..
>
> Best,
> Flavio
>

Mime
View raw message