flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aljoscha Krettek (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (FLINK-3885) Revisit parallelism logic
Date Mon, 03 Apr 2017 12:32:41 GMT

     [ https://issues.apache.org/jira/browse/FLINK-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Aljoscha Krettek updated FLINK-3885:
    Component/s:     (was: Streaming)
                 DataStream API

> Revisit parallelism logic
> -------------------------
>                 Key: FLINK-3885
>                 URL: https://issues.apache.org/jira/browse/FLINK-3885
>             Project: Flink
>          Issue Type: Bug
>          Components: DataStream API
>    Affects Versions: 1.1.0
>            Reporter: Till Rohrmann
> Flink has multiple ways to specify the parallelism of programs. One can set it directly
at the operator or job-wide via the {{ExecutionConfig}}. Then it is also possible to define
a default parallelism in the {{flink-conf.yaml}}. This default parallelism is only respected
on the client side.
> For batch jobs, the default parallelism is consistently set in the {{Optimizer}}. However,
for streaming jobs the default parallelism is only set iff the {{StreamExecutionEnvironment}}
has been created via {{StreamExecutionEnvironment.getExecutionEnvironment()}}. If one creates
a {{RemoteStreamEnvironment}}, then the default parallelism won't be respected. Instead the
parallelism is {{-1}}. Also the {{JobManager}} does not respect the default parallelism. But
whenever it sees a parallelism of {{-1}} it sets it to {{1}}. This behaviour is not consistent
with the batch behaviour and the behaviour described in the documentation of {{parallelism.default}}.
> On top of that we also have the {{PARALLELISM_AUTO_MAX}} value for the parallelism. This
value tells the system to use all available slots. However, this feature is nowhere documented
in our docs.
> I would propose to get rid of {{PARALLELISM_AUTO_MAX}} and to make the default parallelism
behaviour in the batch and streaming API consistent.

This message was sent by Atlassian JIRA

View raw message