flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: [jira] [Commented] (FLINK-1141) Selfjoin fails after DataSet exceeds certain size
Date Wed, 08 Oct 2014 12:54:00 GMT
Can you add a compiler hint that forces a merge-join? That one is not
deadlock prone...

On Wed, Oct 8, 2014 at 2:42 PM, Robert Waury (JIRA) <jira@apache.org> wrote:

>
>     [
> https://issues.apache.org/jira/browse/FLINK-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163437#comment-14163437
> ]
>
> Robert Waury commented on FLINK-1141:
> -------------------------------------
>
> I doesn't have a high priority right now since I have a wasteful but easy
> workaround.
>
> Will the blocking shuffles be included in 0.7 or 0.8?
>
> > Selfjoin fails after DataSet exceeds certain size
> > -------------------------------------------------
> >
> >                 Key: FLINK-1141
> >                 URL: https://issues.apache.org/jira/browse/FLINK-1141
> >             Project: Flink
> >          Issue Type: Bug
> >          Components: Local Runtime
> >    Affects Versions: 0.6.1-incubating
> >         Environment: LocalExecutionEnvironment (dop=4)
> >            Reporter: Robert Waury
> >            Priority: Minor
> >         Attachments: LargeSelfJoin.java
> >
> >
> > As soon as a DataSet exceeds a certain size (1000000 tuples in my
> example) a Selfjoin with a FlatJoinFunction no longer works. After around a
> second the Join, DataSource and DataSink threads are all in Wait and don't
> perform any work (no output files are created) and the job never finishes.
> > If I cut the input size in half it works fine.
> > My current workaround is to create the DataSet twice and join the two
> identical DataSets.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message