spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cheng Lian <lian.cs....@gmail.com>
Subject Re: Running spark function on parquet without sql
Date Sun, 15 Mar 2015 17:35:12 GMT
That's an unfortunate documentation bug in the programming guide... We 
failed to update it after making the change.

Cheng

On 2/28/15 8:13 AM, Deborah Siegel wrote:
> Hi Michael,
>
> Would you help me understand  the apparent difference here..
>
> The Spark 1.2.1 programming guide indicates:
>
> "Note that if you call |schemaRDD.cache()| rather than 
> |sqlContext.cacheTable(...)|, tables will /not/ be cached using the 
> in-memory columnar format, and therefore 
> |sqlContext.cacheTable(...)| is strongly recommended for this use case."
>
> Yet the API doc shows that :
>
>
>         def cache(): SchemaRDD
>         <https://spark.apache.org/docs/1.2.0/api/scala/org/apache/spark/sql/SchemaRDD.html>.this.type
>
>
>         Overridden cache function will always use the in-memory
>         columnar caching.
>
>
>
> links
> https://spark.apache.org/docs/latest/sql-programming-guide.html#caching-data-in-memory
> https://spark.apache.org/docs/1.2.1/api/scala/index.html#org.apache.spark.sql.SchemaRDD
>
> Thanks
> Sincerely
> Deb
>
> On Fri, Feb 27, 2015 at 2:13 PM, Michael Armbrust 
> <michael@databricks.com <mailto:michael@databricks.com>> wrote:
>
>         From Zhan Zhang's reply, yes I still get the parquet's advantage.
>
>     You will need to at least use SQL or the DataFrame API (coming in
>     Spark 1.3) to specify the columns that you want in order to get
>     the parquet benefits.   The rest of your operations can be
>     standard Spark.
>
>         My next question is, if I operate on SchemaRdd will I get the
>         advantage of
>         Spark SQL's in memory columnar store when cached the table using
>         cacheTable()?
>
>
>     Yes, SchemaRDDs always use the in-memory columnar cache for
>     cacheTable and .cache() since Spark 1.2+
>
>


Mime
View raw message