flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Greg Hogan (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-3910) New self-join operator
Date Fri, 13 May 2016 18:18:13 GMT
Greg Hogan created FLINK-3910:

             Summary: New self-join operator
                 Key: FLINK-3910
                 URL: https://issues.apache.org/jira/browse/FLINK-3910
             Project: Flink
          Issue Type: New Feature
          Components: DataSet API, Java API, Scala API
    Affects Versions: 1.1.0
            Reporter: Greg Hogan
            Assignee: Greg Hogan

Flink currently provides inner- and outer-joins as well as cogroup and the non-keyed cross.
{{JoinOperator}} hints at future support for semi- and anti-joins.

Many Gelly algorithms perform a self-join [0]. Still pending reviews, FLINK-3768 performs
a self-join on non-skewed data in TriangleListing.java and FLINK-3780 performs a self-join
on skewed data in JaccardSimilarity.java. A {{SelfJoinHint}} will select between skewed and
non-skewed implementations.

The object-reuse-disabled case can be simply handled with a new {{Operator}}. The object-reuse-enabled
case requires either {{CopyableValue}} types (as in the code above) or a custom driver which
has access to the serializer (or making the serializer accessible to rich functions, and I
think there be dragons).

If the idea of a self-join is agreeable, I'd like to work out a rough implementation and go
from there.

[0] https://en.wikipedia.org/wiki/Join_%28SQL%29#Self-join

This message was sent by Atlassian JIRA

View raw message