flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Junghanns <martin.jungha...@gmx.net>
Subject Containment Join Support
Date Thu, 16 Jul 2015 07:39:55 GMT
Hi everyone,

at first, thanks for building this great framework! We are using Flink
and especially Gelly for building a graph analytics stack (gradoop.com).

I was wondering if there is a [planned] support for a containment join
operator. Consider the following example:

DataSet<List<Int>> left := {[0, 1], [2, 3, 4], [5]}
DataSet<Tuple2<Int, Int>> right := {<0, 1>, <1, 0>, <2, 1>,
<5, 2>}

What I want to compute is

left.join(right).where(list).contains(tuple.f0) :=

{
<[0, 1], <0,1>>, <[0, 1], <1, 0>>,
<[2, 3, 4], <2, 1>>,
<[5], <5, 2>
}

At the moment, I am solving that using cross and filter, which can be
expensive.

The generalization of that operator would be "set containment join",
where you join if the right set is contained in the left set.

If there is a general need for that operator, I would also like to
contribute to its implementation.

But maybe, there is already another nice solution which I didn't
discover yet?

Any help would be appreciated. Especially since I would also like to
contribute some of our graph operators (e.g., graph summarization) back
to Flink/Gelly (current WIP state can be found here: [1]).

Thanks,

Martin


[1]
https://github.com/dbs-leipzig/gradoop/blob/%2345_gradoop_flink/gradoop-flink/src/main/java/org/gradoop/model/impl/operators/Summarization.java



Mime
View raw message