beam-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <>
Subject cores and partitions in DataFlow
Date Fri, 14 Sep 2018 01:34:04 GMT
Like Spark has 2 levels of processing
a) across different worker.
b) Within same Executor - multiple cores can work on different partitions.

I know in Apache Beam with DataFlow as Runner - partitioning is abstracted. But does Dataflow
uses multiple cores to process different partitions at same time. 

Objective is to understand what machines should be used to run Pipelines.  Does one should
give a thought about cores on machine or does it not matter ?


View raw message