spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sourabh Chandak <>
Subject spark.streaming.kafka.maxRatePerPartition for direct stream
Date Thu, 01 Oct 2015 20:39:15 GMT

I am writing a spark streaming job using the direct stream method for kafka
and wanted to handle the case of checkpoint failure when we'll have to
reprocess the entire data from starting. By default for every new
checkpoint it tries to load everything from each partition and that takes a
lot of time for processing. After some searching found out that there
exists a config spark.streaming.kafka.maxRatePerPartition which can be used
to tackle this. My question is what will be a suitable range for this
config if we have ~12 million messages in kafka with maximum message size
~10 MB.


View raw message