beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <>
Subject [jira] [Commented] (BEAM-958) desiredNumWorkers in Dataflow is too low
Date Thu, 10 Nov 2016 20:14:58 GMT


Raghu Angadi commented on BEAM-958:

A change to this policy can break Dataflow job update depending the source as update requires
number of sources to remain same across an update. Native pubsub source is not affected. 

> desiredNumWorkers in Dataflow is too low
> ----------------------------------------
>                 Key: BEAM-958
>                 URL:
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-dataflow
>    Affects Versions: 0.3.0-incubating
>            Reporter: Raghu Angadi
>            Assignee: Davor Bonaci
>              Labels: breaking_change
> {{desiredNumWorkers}} in [UnboundedSource API|]
is a suggestion to a source about how many splits it should create. KafkaIO currently takes
this literally and only creates up to this many splits.
> The main draw back is that it is very low in Dataflow. It is calculated as 
>   * {{1 * maxNumWorkers}} if {{--maxNumWorkers}} is specified, otherwise
>   * {{3 * numWorkers}}.
> That implies there is only single reader per worker (which is usually a 4 core VM). That
can leave CPU under utilized on many pipelines.
> Even 3x in case of fixes number of workers seems low to me. 

This message was sent by Atlassian JIRA

View raw message