spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From huylv <>
Subject Where to save intermediate results?
Date Thu, 28 Aug 2014 20:30:07 GMT

I'm building a system for near real-time data analytics. My plan is to have
an ETL batch job which calculates aggregations running periodically. User
queries are then parsed for on-demand calculations, also in Spark. Where are
the pre-calculated results supposed to be saved? I mean, after finishing
aggregations, the ETL job will terminate, so caches are wiped out of memory.
How can I use these results to calculate on-demand queries? Or more
generally, could you please give me a good way to organize the data flow and
jobs in order to achieve this?

I'm new to Spark so sorry if this might sound like a dumb question.

Thank you.

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message