flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Containment Join Support
Date Thu, 16 Jul 2015 22:26:12 GMT
Hi Martin,

good to hear that you like Flink :-)
AFAIK, there are no plans to add a containment join. The Flink community is
currently working on adding support for outer joins.
Regarding a containment join, I am not sure about the number of use cases.
I would rather try to implement it on top of Flink's batch API instead of
adding it as an internal feature/operator to the system because this would
touch a lot of things (API, optimizer, operator implementation).

There might be better ways to implement a containment join than using a
cross and a filter.
- Do you know a distributed algorithm for containment joins? Maybe it can
be implemented with Flink's API.
- I guess, you are implementing a generic graph framework, but can you make
certain assumptions about the data such as relative sizes of the inputs or
avg/max size of the lists, etc.?

Contributions to Gelly (and Flink in general) are highly welcome.

Best, Fabian


2015-07-16 9:39 GMT+02:00 Martin Junghanns <martin.junghanns@gmx.net>:

> Hi everyone,
>
> at first, thanks for building this great framework! We are using Flink
> and especially Gelly for building a graph analytics stack (gradoop.com).
>
> I was wondering if there is a [planned] support for a containment join
> operator. Consider the following example:
>
> DataSet<List<Int>> left := {[0, 1], [2, 3, 4], [5]}
> DataSet<Tuple2<Int, Int>> right := {<0, 1>, <1, 0>, <2, 1>,
<5, 2>}
>
> What I want to compute is
>
> left.join(right).where(list).contains(tuple.f0) :=
>
> {
> <[0, 1], <0,1>>, <[0, 1], <1, 0>>,
> <[2, 3, 4], <2, 1>>,
> <[5], <5, 2>
> }
>
> At the moment, I am solving that using cross and filter, which can be
> expensive.
>
> The generalization of that operator would be "set containment join",
> where you join if the right set is contained in the left set.
>
> If there is a general need for that operator, I would also like to
> contribute to its implementation.
>
> But maybe, there is already another nice solution which I didn't
> discover yet?
>
> Any help would be appreciated. Especially since I would also like to
> contribute some of our graph operators (e.g., graph summarization) back
> to Flink/Gelly (current WIP state can be found here: [1]).
>
> Thanks,
>
> Martin
>
>
> [1]
>
> https://github.com/dbs-leipzig/gradoop/blob/%2345_gradoop_flink/gradoop-flink/src/main/java/org/gradoop/model/impl/operators/Summarization.java
>
>
>

Mime
View raw message