flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Waury (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (FLINK-1141) Selfjoin fails after DataSet exceeds certain size
Date Thu, 16 Oct 2014 12:16:33 GMT

     [ https://issues.apache.org/jira/browse/FLINK-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Robert Waury updated FLINK-1141:
    Attachment: execution_plan.json

He seems to ignore my join hints.

This was tested with the current 0.7-incubating-SNAPSHOT.

> Selfjoin fails after DataSet exceeds certain size
> -------------------------------------------------
>                 Key: FLINK-1141
>                 URL: https://issues.apache.org/jira/browse/FLINK-1141
>             Project: Flink
>          Issue Type: Bug
>          Components: Distributed Runtime, Local Runtime
>    Affects Versions: 0.6.1-incubating, 0.7-incubating
>         Environment: LocalExecutionEnvironment (dop=4)
>            Reporter: Robert Waury
>            Priority: Minor
>         Attachments: LargeSelfJoin.java, execution_plan.json
> en.As soon as a DataSet exceeds a certain size (1000000 tuples in my example) a Selfjoin
with a FlatJoinFunction no longer works. After around a second the Join, DataSource and DataSink
threads are all in Wait and don't perform any work (no output files are created) and the job
never finishes.
> If I cut the input size in half it works fine.
> My current workaround is to create the DataSet twice and join the two identical DataSets.

This message was sent by Atlassian JIRA

View raw message