beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <>
Subject [jira] [Created] (BEAM-958) desiredNumWorkers in Dataflow is too low
Date Thu, 10 Nov 2016 20:11:58 GMT
Raghu Angadi created BEAM-958:

             Summary: desiredNumWorkers in Dataflow is too low
                 Key: BEAM-958
             Project: Beam
          Issue Type: Improvement
          Components: runner-dataflow
    Affects Versions: 0.3.0-incubating
            Reporter: Raghu Angadi
            Assignee: Davor Bonaci

{{desiredNumWorkers}} in [UnboundedSource API|]
is a suggestion to a source about how many splits it should create. KafkaIO currently takes
this literally and only creates up to this many splits.

The main draw back is that it is very low in Dataflow. It is calculated as 
  * {{1 * maxNumWorkers}} if {{--maxNumWorkers}} is specified, otherwise
  * {{3 * numWorkers}}.

That implies there is only single reader per worker (which is usually a 4 core VM). That can
leave CPU under utilized on many pipelines.
Even 3x in case of fixes number of workers seems low to me. 

This message was sent by Atlassian JIRA

View raw message