spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From coolcoolkid <>
Subject Re: How does MapWithStateRDD distribute the data
Date Fri, 16 Jun 2017 08:47:52 GMT

I have encountered some situation just like what is described above. I am
running a Spark Streaming Application with 2 executors, 16 cores and 10G
memory for each executor and the input topic Kafka has 64 partitions.

My code are like this:
.map(s => (k, v))

I was also expecting the 32 partitions of the MapWithStateRDD would be
distributed evenly between the two executors, but it turned out that all the
32 were on one of them.

I noticed that you replyed 'Are you using KafkaUtils.createDirectStream? '
and I was wondering whether this Kafka direct stream lead to this situation.
Or is there something else?

Thanks a lot!

View this message in context:
Sent from the Apache Spark Developers List mailing list archive at

To unsubscribe e-mail:

View raw message