spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gourav Sengupta <>
Subject storing query object
Date Tue, 19 Jan 2016 11:24:44 GMT

I have a SPARK table (created from hiveContext) with couple of hundred
partitions and few thousand files.

When I run query on the table then spark spends a lot of time (as seen in
the pyspark output) to collect this files from the several partitions.
After this the query starts running.

Is there a way to store the object which has collected all these partitions
and files so that every time I restart the job I load this object instead
of taking  50 mins to just collect the files before starting to run the

Please do let me know in case the question is not quite clear.

Gourav Sengupta

View raw message