flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Left outer join
Date Thu, 16 Apr 2015 13:23:33 GMT
That solution works if you can define a NULL_ELEMENT but not if you want to
use the full value range of Integer.

This is something that we need to solve a bit differently.
Maybe by adding optional null-valued field support to Tuple.


2015-04-15 5:59 GMT-05:00 Maximilian Michels <mxm@apache.org>:

> Hi Flavio,
>
> Here's an simple example of a Left Outer Join:
> https://gist.github.com/mxm/c2e9c459a9d82c18d789
>
> As Stephan pointed out, this can be very easily modified to construct a
> Right Outer Join (just exchange leftElements and rightElements in the two
> loops).
>
> Here's an excerpt with the most important part, the coGroup function:
>
> public static class LeftOuterJoin implements CoGroupFunction<Tuple2<Integer, String>,
Tuple2<Integer, String>, Tuple2<Integer, Integer>> {
>
>    @Override
>    public void coGroup(Iterable<Tuple2<Integer, String>> leftElements,
>                        Iterable<Tuple2<Integer, String>> rightElements,
>                        Collector<Tuple2<Integer, Integer>> out) throws Exception
{
>
>       final int NULL_ELEMENT = -1;
>
>       for (Tuple2<Integer, String> leftElem : leftElements) {
>          boolean hadElements = false;
>          for (Tuple2<Integer, String> rightElem : rightElements) {
>             out.collect(new Tuple2<Integer, Integer>(leftElem.f0, rightElem.f0));
>             hadElements = true;
>          }
>          if (!hadElements) {
>             out.collect(new Tuple2<Integer, Integer>(leftElem.f0, NULL_ELEMENT));
>          }
>       }
>
>    }
> }
>
>
>
> On Wed, Apr 15, 2015 at 11:01 AM, Stephan Ewen <sewen@apache.org> wrote:
>
>> I think this may be a great example to add as a utility function.
>>
>> Or actually add as an function to the DataSet, internally realized as a
>> special case of coGroup.
>>
>> We do not have a ready example of that, but it should be straightforward
>> to realize. Similar as for the join, coGroup on the join keys. Inside the
>> coGroup function, emit the combination of all values from the two
>> iterators. If one of them is empty (the one that is not outer) then emit
>> all values from the outer side.
>>
>> Greetings,
>> Stephan
>>
>>
>> On Wed, Apr 15, 2015 at 10:36 AM, Flavio Pompermaier <
>> pompermaier@okkam.it> wrote:
>>
>>> Do you have an already working example of it? :)
>>>
>>>
>>> On Wed, Apr 15, 2015 at 10:32 AM, Ufuk Celebi <uce@apache.org> wrote:
>>>
>>>>
>>>> On 15 Apr 2015, at 10:30, Flavio Pompermaier <pompermaier@okkam.it>
>>>> wrote:
>>>>
>>>> >
>>>> > Hi to all,
>>>> > I have to join two datasets but I'd like to keep all data in the left
>>>> also if there' no right dataset.
>>>> > How can you achieve that in Flink? maybe I should use coGroup?
>>>>
>>>> Yes, currently you have to implement this manually with a coGroup
>>>
>>>
>>>
>>
>

Mime
View raw message