spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From emiretsk <eugene.miret...@gmail.com>
Subject Spark Streaming: Doing operation in Receiver vs RDD
Date Wed, 07 Oct 2015 19:55:17 GMT
Hi,

I have a Spark Streaming program that is consuming message from Kafka and
has to decrypt and deserialize each message. I can implement it either as 
Kafka deserializer (that will run in a receiver or the new receiver-less
Kafka consumer)  or as RDD operations. What are the pros/cons of each?

As I see it, doing the operations on RDDs has the following implications
Better load balancing, and fault tolerance. (though I'm not quite sure what
happens when a receiver fails). Also, not sure if this is still true with
the new Kafka receiver-less consumer as it creates an RDD partition for each
Kafka partition
All functions that are applied to RDDs need to be either static or part of
serialzable objects. This makes using standard/3rd party Java libraries
harder. 
Cheers,
Eugene



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Doing-operation-in-Receiver-vs-RDD-tp24973.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message