From Flavio Pompermaier <pomperma...@okkam.it>
Subject Union of multiple datasets vs Join
Date Mon, 22 Dec 2014 10:47:47 GMT
Hi guys,

In my use case I have multiple Datasets with the same structure (e.g.
Tuple3) and I want to produce an output Dataset containing all Tuple3
grouped by the first field (0).
I can obtain the same results performing a union of all datasets and then a
group by (simplest implementation) or join all of them pairwise
(((A->B)->C)->D)..) or I don't know if there is any other solution. When
should I use the first or the second approach? Could you help me in
figuring out the internals of the two approaches? I always have some fear
when using multiple joins when I don't know exactly their size..


