spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From emiretsk <>
Subject Spark Streaming: Doing operation in Receiver vs RDD
Date Wed, 07 Oct 2015 19:55:17 GMT

I have a Spark Streaming program that is consuming message from Kafka and
has to decrypt and deserialize each message. I can implement it either as 
Kafka deserializer (that will run in a receiver or the new receiver-less
Kafka consumer)  or as RDD operations. What are the pros/cons of each?

As I see it, doing the operations on RDDs has the following implications
Better load balancing, and fault tolerance. (though I'm not quite sure what
happens when a receiver fails). Also, not sure if this is still true with
the new Kafka receiver-less consumer as it creates an RDD partition for each
Kafka partition
All functions that are applied to RDDs need to be either static or part of
serialzable objects. This makes using standard/3rd party Java libraries

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message