spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Soumitra Johri <>
Subject How does MapWithStateRDD distribute the data
Date Wed, 03 Aug 2016 14:42:30 GMT

I am running a steaming job with 4 executors and 16 cores so that each
executor has two cores to work with. The input Kafka topic has 4 partitions.
With this given configuration I was expecting MapWithStateRDD to be evenly
distributed across all executors, how ever I see that it uses only two
executors on which MapWithStateRDD data is distributed. Sometimes the data
goes only to one executor.

How can this be explained and pretty sure there would be some math to
understand this behavior.

I am using the standard standalone 1.6.2 cluster.


View raw message