flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fabian Hueske (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (FLINK-9289) Parallelism of generated operators should have max parallism of input
Date Wed, 02 May 2018 13:09:00 GMT

     [ https://issues.apache.org/jira/browse/FLINK-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Fabian Hueske updated FLINK-9289:
---------------------------------
    Description: 
The DataSet API aims to chain generated operators such as key extraction mappers to their
predecessor. This is done by assigning the same parallelism as the input operator.

If a generated operator has more than two inputs, the operator cannot be chained anymore and
the operator is generated with default parallelism. This can lead to a {code}NoResourceAvailableException:
Not enough free slots available to run the job.{code} as reported by a user on the mailing
list: https://lists.apache.org/thread.html/60a8bffcce54717b6273bf3de0f43f1940fbb711590f4b90cd666c9a@%3Cuser.flink.apache.org%3E

I suggest to set the parallelism of a generated operator to the max parallelism of all of
its inputs to fix this problem.

Until the problem is fixed, a workaround is to set the default parallelism at the {{ExecutionEnvironment}}:

{code}
ExecutionEnvironment env = ...
env.setParallelism(2);
{code}

  was:
The DataSet API aims to chain generated operators such as key extraction mappers to their
predecessor. This is done by assigning the same parallelism as the input operator.

If a generated operator has more than two inputs, the operator cannot be chained anymore and
the operator is generated with default parallelism. This can lead to a {code}NoResourceAvailableException:
Not enough free slots available to run the job.{code} as reported by a user on the mailing
list: https://lists.apache.org/thread.html/60a8bffcce54717b6273bf3de0f43f1940fbb711590f4b90cd666c9a@%3Cuser.flink.apache.org%3E

I suggest to set the parallelism of a generated operator to the max parallelism of all of
its inputs to fix this problem.


> Parallelism of generated operators should have max parallism of input
> ---------------------------------------------------------------------
>
>                 Key: FLINK-9289
>                 URL: https://issues.apache.org/jira/browse/FLINK-9289
>             Project: Flink
>          Issue Type: Bug
>          Components: DataSet API
>    Affects Versions: 1.5.0, 1.4.2, 1.6.0
>            Reporter: Fabian Hueske
>            Priority: Major
>
> The DataSet API aims to chain generated operators such as key extraction mappers to their
predecessor. This is done by assigning the same parallelism as the input operator.
> If a generated operator has more than two inputs, the operator cannot be chained anymore
and the operator is generated with default parallelism. This can lead to a {code}NoResourceAvailableException:
Not enough free slots available to run the job.{code} as reported by a user on the mailing
list: https://lists.apache.org/thread.html/60a8bffcce54717b6273bf3de0f43f1940fbb711590f4b90cd666c9a@%3Cuser.flink.apache.org%3E
> I suggest to set the parallelism of a generated operator to the max parallelism of all
of its inputs to fix this problem.
> Until the problem is fixed, a workaround is to set the default parallelism at the {{ExecutionEnvironment}}:
> {code}
> ExecutionEnvironment env = ...
> env.setParallelism(2);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message