flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Till Rohrmann <trohrm...@apache.org>
Subject Re: Left outer join
Date Thu, 16 Apr 2015 19:48:00 GMT
You can materialize the input of the right input by creating an array out
of it, for example. Then you can reiterate over it.

Cheers,
Till
On Apr 16, 2015 7:37 PM, "Flavio Pompermaier" <pompermaier@okkam.it> wrote:

> Hi Maximilian,
> I tried your solution but it doesn't work because the rightElements
> iterator cannot be used more than once:
>
> Caused by: org.apache.flink.util.TraversableOnceException: The Iterable
> can be iterated over only once. Only the first call to 'iterator()' will
> succeed.
>
> On Wed, Apr 15, 2015 at 12:59 PM, Maximilian Michels <mxm@apache.org>
> wrote:
>
>> Hi Flavio,
>>
>> Here's an simple example of a Left Outer Join:
>> https://gist.github.com/mxm/c2e9c459a9d82c18d789
>>
>> As Stephan pointed out, this can be very easily modified to construct a
>> Right Outer Join (just exchange leftElements and rightElements in the two
>> loops).
>>
>> Here's an excerpt with the most important part, the coGroup function:
>>
>> public static class LeftOuterJoin implements CoGroupFunction<Tuple2<Integer,
String>, Tuple2<Integer, String>, Tuple2<Integer, Integer>> {
>>
>>    @Override
>>    public void coGroup(Iterable<Tuple2<Integer, String>> leftElements,
>>                        Iterable<Tuple2<Integer, String>> rightElements,
>>                        Collector<Tuple2<Integer, Integer>> out) throws
Exception {
>>
>>       final int NULL_ELEMENT = -1;
>>
>>       for (Tuple2<Integer, String> leftElem : leftElements) {
>>          boolean hadElements = false;
>>          for (Tuple2<Integer, String> rightElem : rightElements) {
>>             out.collect(new Tuple2<Integer, Integer>(leftElem.f0, rightElem.f0));
>>             hadElements = true;
>>          }
>>          if (!hadElements) {
>>             out.collect(new Tuple2<Integer, Integer>(leftElem.f0, NULL_ELEMENT));
>>          }
>>       }
>>
>>    }
>> }
>>
>>
>>
>> On Wed, Apr 15, 2015 at 11:01 AM, Stephan Ewen <sewen@apache.org> wrote:
>>
>>> I think this may be a great example to add as a utility function.
>>>
>>> Or actually add as an function to the DataSet, internally realized as a
>>> special case of coGroup.
>>>
>>> We do not have a ready example of that, but it should be straightforward
>>> to realize. Similar as for the join, coGroup on the join keys. Inside the
>>> coGroup function, emit the combination of all values from the two
>>> iterators. If one of them is empty (the one that is not outer) then emit
>>> all values from the outer side.
>>>
>>> Greetings,
>>> Stephan
>>>
>>>
>>> On Wed, Apr 15, 2015 at 10:36 AM, Flavio Pompermaier <
>>> pompermaier@okkam.it> wrote:
>>>
>>>> Do you have an already working example of it? :)
>>>>
>>>>
>>>> On Wed, Apr 15, 2015 at 10:32 AM, Ufuk Celebi <uce@apache.org> wrote:
>>>>
>>>>>
>>>>> On 15 Apr 2015, at 10:30, Flavio Pompermaier <pompermaier@okkam.it>
>>>>> wrote:
>>>>>
>>>>> >
>>>>> > Hi to all,
>>>>> > I have to join two datasets but I'd like to keep all data in the
>>>>> left also if there' no right dataset.
>>>>> > How can you achieve that in Flink? maybe I should use coGroup?
>>>>>
>>>>> Yes, currently you have to implement this manually with a coGroup
>>>>
>>>>
>>>>
>>>
>>
>
>

Mime
View raw message