spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From coolcoolkid <coolcool...@163.com>
Subject Re: How does MapWithStateRDD distribute the data
Date Fri, 16 Jun 2017 08:47:52 GMT
Hello,

I have encountered some situation just like what is described above. I am
running a Spark Streaming Application with 2 executors, 16 cores and 10G
memory for each executor and the input topic Kafka has 64 partitions.

My code are like this:
--------------------------------------------
KafkaUtils.createDirectStream(...) 
...
.map(s => (k, v))
.mapWithState(...numPartitions(32))
...
.foreachRdd(_.foreachPartition(output))
--------------------------------------------

I was also expecting the 32 partitions of the MapWithStateRDD would be
distributed evenly between the two executors, but it turned out that all the
32 were on one of them.

I noticed that you replyed 'Are you using KafkaUtils.createDirectStream? '
and I was wondering whether this Kafka direct stream lead to this situation.
Or is there something else?

Thanks a lot!




--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/How-does-MapWithStateRDD-distribute-the-data-tp18544p21770.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Mime
View raw message