flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Waury (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-1141) Selfjoin fails after DataSet exceeds certain size
Date Wed, 08 Oct 2014 11:01:33 GMT
Robert Waury created FLINK-1141:

             Summary: Selfjoin fails after DataSet exceeds certain size
                 Key: FLINK-1141
                 URL: https://issues.apache.org/jira/browse/FLINK-1141
             Project: Flink
          Issue Type: Bug
          Components: Local Runtime
    Affects Versions: 0.6.1-incubating
         Environment: LocalExecutionEnvironment (dop=4)
            Reporter: Robert Waury
            Priority: Minor

As soon as a DataSet exceeds a certain size (1000000 tuples in my example) a Selfjoin with
a FlatJoinFunction no longer works. After around a second the Join, DataSource and DataSink
threads are all in Wait and don't perform any work (no output files are created) and the job
never finishes.

If I cut the input size in half it works fine.

My current workaround is to create the DataSet twice and join the two identical DataSets.

This message was sent by Atlassian JIRA

View raw message