spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oded Maimon <>
Subject Spark Streaming - use the data in different jobs
Date Sun, 18 Oct 2015 11:49:11 GMT
we've build a spark streaming process that get data from a pub/sub
(rabbitmq in our case).

now we want the streamed data to be used in different spark jobs (also in
realtime if possible)

what options do we have for doing that ?

   - can the streaming process and different spark jobs share/access the
   same RDD's?
   - can the streaming process create a sparkSQL table and other jobs
   read/use it?
   - can a spark streaming process trigger other spark jobs and send the
   the data (in memory)?
   - can a spark streaming process cache the data in memory and other
   scheduled jobs access same rdd's?
   - should we keep the data to hbase and read it from other jobs?
   - other ways?

I believe that the answer will be using external db/storage..  hoping to
have a different solution :)


Oded Maimon


*This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are 
addressed. Please note that any disclosure, copying or distribution of the 
content of this information is strictly forbidden. If you have received 
this email message in error, please destroy it immediately and notify its 

View raw message