spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ewan Leith <>
Subject RE: Spark Streaming - use the data in different jobs
Date Mon, 19 Oct 2015 09:34:10 GMT
Storing the data in HBase, Cassandra, or similar is possibly the right answer, the other option
that can work well is re-publishing the data back into second queue on RabbitMQ, to be read
again by the next job.


From: Oded Maimon []
Sent: 18 October 2015 12:49
To: user <>
Subject: Spark Streaming - use the data in different jobs

we've build a spark streaming process that get data from a pub/sub (rabbitmq in our case).

now we want the streamed data to be used in different spark jobs (also in realtime if possible)

what options do we have for doing that ?

  *   can the streaming process and different spark jobs share/access the same RDD's?
  *   can the streaming process create a sparkSQL table and other jobs read/use it?
  *   can a spark streaming process trigger other spark jobs and send the the data (in memory)?
  *   can a spark streaming process cache the data in memory and other scheduled jobs access
same rdd's?
  *   should we keep the data to hbase and read it from other jobs?
  *   other ways?

I believe that the answer will be using external db/storage..  hoping to have a different
solution :)


Oded Maimon

This email and any files transmitted with it are confidential and intended solely for the
use of the individual or entity to whom they are addressed. Please note that any disclosure,
copying or distribution of the content of this information is strictly forbidden. If you have
received this email message in error, please destroy it immediately and notify its sender.
View raw message