spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <mich...@databricks.com>
Subject Re: Caching tables at column level
Date Sun, 01 Feb 2015 21:27:20 GMT
Its not completely transparent, but you can do something like the following
today:

CACHE TABLE hotData AS SELECT columns, I, care, about FROM fullTable

On Sun, Feb 1, 2015 at 3:03 AM, Mick Davies <michael.belldavies@gmail.com>
wrote:

> I have been working a lot recently with denormalised tables with lots of
> columns, nearly 600. We are using this form to avoid joins.
>
> I have tried to use cache table with this data, but it proves too expensive
> as it seems to try to cache all the data in the table.
>
> For data sets such as the one I am using you find that certain columns will
> be hot, referenced frequently in queries, others will be used very
> infrequently.
>
> Therefore it would be great if caches could be column based. I realise that
> this may not be optimal for all use cases, but I think it could be quite a
> common need.  Has something like this been considered?
>
> Thanks Mick
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Caching-tables-at-column-level-tp10377.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message