flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sajeev Ramakrishnan (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-4902) Flink Task Chain not getting input in a distributed manner
Date Tue, 25 Oct 2016 03:01:58 GMT
Sajeev Ramakrishnan created FLINK-4902:
------------------------------------------

             Summary: Flink Task Chain not getting input in a distributed manner
                 Key: FLINK-4902
                 URL: https://issues.apache.org/jira/browse/FLINK-4902
             Project: Flink
          Issue Type: Bug
          Components: DataSet API
    Affects Versions: 1.1.0
         Environment: RHEL 6.6
            Reporter: Sajeev Ramakrishnan


Dear Team,

  I have the following tasks chained as a single subtask.

left outer join -> filter -> map -> flatMap.

The input to this would be two streams 
memberPlan - 22 million
groupPlan - 1 million.

I am running the entire job with parallelism 16. Before this task chain, I am doing two left
outer joins.

The problem is that one slot is getting 22 million and rest 15 slots are getting the input
from groupPlan.

This is making the entire execution very slow, probably 4 hours slower.

Can you please throw some light on this.

Regards,
Sajeev




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message