spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <>
Subject Re: Caching tables at column level
Date Sun, 01 Feb 2015 21:27:20 GMT
Its not completely transparent, but you can do something like the following

CACHE TABLE hotData AS SELECT columns, I, care, about FROM fullTable

On Sun, Feb 1, 2015 at 3:03 AM, Mick Davies <>

> I have been working a lot recently with denormalised tables with lots of
> columns, nearly 600. We are using this form to avoid joins.
> I have tried to use cache table with this data, but it proves too expensive
> as it seems to try to cache all the data in the table.
> For data sets such as the one I am using you find that certain columns will
> be hot, referenced frequently in queries, others will be used very
> infrequently.
> Therefore it would be great if caches could be column based. I realise that
> this may not be optimal for all use cases, but I think it could be quite a
> common need.  Has something like this been considered?
> Thanks Mick
> --
> View this message in context:
> Sent from the Apache Spark Developers List mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message