flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Pompermaier <pomperma...@okkam.it>
Subject Re: Left outer join
Date Thu, 16 Apr 2015 20:09:17 GMT
I cannot find a solution to my use case :(
I have 2 datasets D1 and D2 like:

D1:
A,p1,a1
A,p2,a2
A,p3,X
B,p3,Y
B,p1,b1

D2:
X,s,V
X,r,2
Y,j,k

I'd like to have a unique dataset D3(Tuple4) like

A,X,a1,a2
B,Y,b1,null

Basically filling with <D1.f0,D2.f0,D1.f2(D1.f1==p1),D1.f2(if D1.f1==p2)>
when D1.f2==D2.f0.
Is that possible and how?
Could you show me a simple snippet?

Thanks in advance,
Flavio




On Thu, Apr 16, 2015 at 9:48 PM, Till Rohrmann <trohrmann@apache.org> wrote:

> You can materialize the input of the right input by creating an array out
> of it, for example. Then you can reiterate over it.
>
> Cheers,
> Till
> On Apr 16, 2015 7:37 PM, "Flavio Pompermaier" <pompermaier@okkam.it>
> wrote:
>
>> Hi Maximilian,
>> I tried your solution but it doesn't work because the rightElements
>> iterator cannot be used more than once:
>>
>> Caused by: org.apache.flink.util.TraversableOnceException: The Iterable
>> can be iterated over only once. Only the first call to 'iterator()' will
>> succeed.
>>
>> On Wed, Apr 15, 2015 at 12:59 PM, Maximilian Michels <mxm@apache.org>
>> wrote:
>>
>>> Hi Flavio,
>>>
>>> Here's an simple example of a Left Outer Join:
>>> https://gist.github.com/mxm/c2e9c459a9d82c18d789
>>>
>>> As Stephan pointed out, this can be very easily modified to construct a
>>> Right Outer Join (just exchange leftElements and rightElements in the two
>>> loops).
>>>
>>> Here's an excerpt with the most important part, the coGroup function:
>>>
>>> public static class LeftOuterJoin implements CoGroupFunction<Tuple2<Integer,
String>, Tuple2<Integer, String>, Tuple2<Integer, Integer>> {
>>>
>>>    @Override
>>>    public void coGroup(Iterable<Tuple2<Integer, String>> leftElements,
>>>                        Iterable<Tuple2<Integer, String>> rightElements,
>>>                        Collector<Tuple2<Integer, Integer>> out) throws
Exception {
>>>
>>>       final int NULL_ELEMENT = -1;
>>>
>>>       for (Tuple2<Integer, String> leftElem : leftElements) {
>>>          boolean hadElements = false;
>>>          for (Tuple2<Integer, String> rightElem : rightElements) {
>>>             out.collect(new Tuple2<Integer, Integer>(leftElem.f0, rightElem.f0));
>>>             hadElements = true;
>>>          }
>>>          if (!hadElements) {
>>>             out.collect(new Tuple2<Integer, Integer>(leftElem.f0, NULL_ELEMENT));
>>>          }
>>>       }
>>>
>>>    }
>>> }
>>>
>>>
>>>
>>> On Wed, Apr 15, 2015 at 11:01 AM, Stephan Ewen <sewen@apache.org> wrote:
>>>
>>>> I think this may be a great example to add as a utility function.
>>>>
>>>> Or actually add as an function to the DataSet, internally realized as a
>>>> special case of coGroup.
>>>>
>>>> We do not have a ready example of that, but it should be
>>>> straightforward to realize. Similar as for the join, coGroup on the join
>>>> keys. Inside the coGroup function, emit the combination of all values from
>>>> the two iterators. If one of them is empty (the one that is not outer) then
>>>> emit all values from the outer side.
>>>>
>>>> Greetings,
>>>> Stephan
>>>>
>>>>
>>>> On Wed, Apr 15, 2015 at 10:36 AM, Flavio Pompermaier <
>>>> pompermaier@okkam.it> wrote:
>>>>
>>>>> Do you have an already working example of it? :)
>>>>>
>>>>>
>>>>> On Wed, Apr 15, 2015 at 10:32 AM, Ufuk Celebi <uce@apache.org>
wrote:
>>>>>
>>>>>>
>>>>>> On 15 Apr 2015, at 10:30, Flavio Pompermaier <pompermaier@okkam.it>
>>>>>> wrote:
>>>>>>
>>>>>> >
>>>>>> > Hi to all,
>>>>>> > I have to join two datasets but I'd like to keep all data in
the
>>>>>> left also if there' no right dataset.
>>>>>> > How can you achieve that in Flink? maybe I should use coGroup?
>>>>>>
>>>>>> Yes, currently you have to implement this manually with a coGroup
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>>

Mime
View raw message