flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maximilian Michels <...@apache.org>
Subject Re: Left outer join
Date Thu, 16 Apr 2015 14:36:33 GMT
>
> This is something that we need to solve a bit differently.
> Maybe by adding optional null-valued field support to Tuple.
>

+1

That was just a proof of concept. I agree, for a proper implementation, one
would need to differentiate between a regular element and a NULL element.

On Thu, Apr 16, 2015 at 3:23 PM, Fabian Hueske <fhueske@gmail.com> wrote:

> That solution works if you can define a NULL_ELEMENT but not if you want
> to use the full value range of Integer.
>
> This is something that we need to solve a bit differently.
> Maybe by adding optional null-valued field support to Tuple.
>
>
> 2015-04-15 5:59 GMT-05:00 Maximilian Michels <mxm@apache.org>:
>
> Hi Flavio,
>>
>> Here's an simple example of a Left Outer Join:
>> https://gist.github.com/mxm/c2e9c459a9d82c18d789
>>
>> As Stephan pointed out, this can be very easily modified to construct a
>> Right Outer Join (just exchange leftElements and rightElements in the two
>> loops).
>>
>> Here's an excerpt with the most important part, the coGroup function:
>>
>> public static class LeftOuterJoin implements CoGroupFunction<Tuple2<Integer,
String>, Tuple2<Integer, String>, Tuple2<Integer, Integer>> {
>>
>>    @Override
>>    public void coGroup(Iterable<Tuple2<Integer, String>> leftElements,
>>                        Iterable<Tuple2<Integer, String>> rightElements,
>>                        Collector<Tuple2<Integer, Integer>> out) throws
Exception {
>>
>>       final int NULL_ELEMENT = -1;
>>
>>       for (Tuple2<Integer, String> leftElem : leftElements) {
>>          boolean hadElements = false;
>>          for (Tuple2<Integer, String> rightElem : rightElements) {
>>             out.collect(new Tuple2<Integer, Integer>(leftElem.f0, rightElem.f0));
>>             hadElements = true;
>>          }
>>          if (!hadElements) {
>>             out.collect(new Tuple2<Integer, Integer>(leftElem.f0, NULL_ELEMENT));
>>          }
>>       }
>>
>>    }
>> }
>>
>>
>>
>> On Wed, Apr 15, 2015 at 11:01 AM, Stephan Ewen <sewen@apache.org> wrote:
>>
>>> I think this may be a great example to add as a utility function.
>>>
>>> Or actually add as an function to the DataSet, internally realized as a
>>> special case of coGroup.
>>>
>>> We do not have a ready example of that, but it should be straightforward
>>> to realize. Similar as for the join, coGroup on the join keys. Inside the
>>> coGroup function, emit the combination of all values from the two
>>> iterators. If one of them is empty (the one that is not outer) then emit
>>> all values from the outer side.
>>>
>>> Greetings,
>>> Stephan
>>>
>>>
>>> On Wed, Apr 15, 2015 at 10:36 AM, Flavio Pompermaier <
>>> pompermaier@okkam.it> wrote:
>>>
>>>> Do you have an already working example of it? :)
>>>>
>>>>
>>>> On Wed, Apr 15, 2015 at 10:32 AM, Ufuk Celebi <uce@apache.org> wrote:
>>>>
>>>>>
>>>>> On 15 Apr 2015, at 10:30, Flavio Pompermaier <pompermaier@okkam.it>
>>>>> wrote:
>>>>>
>>>>> >
>>>>> > Hi to all,
>>>>> > I have to join two datasets but I'd like to keep all data in the
>>>>> left also if there' no right dataset.
>>>>> > How can you achieve that in Flink? maybe I should use coGroup?
>>>>>
>>>>> Yes, currently you have to implement this manually with a coGroup
>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message