spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Taotao.Li" <charles.up...@gmail.com>
Subject Re: Will spark cache table once even if I call read/cache on the same table multiple times
Date Sun, 20 Nov 2016 11:18:22 GMT
hi, you can check my stackoverflow question :
http://stackoverflow.com/questions/36195105/what-happens-if-i-cache-the-same-rdd-twice-in-spark/36195812#36195812

On Sat, Nov 19, 2016 at 3:16 AM, Rabin Banerjee <
dev.rabin.banerjee@gmail.com> wrote:

> Hi Yong,
>
>   But every time  val tabdf = sqlContext.table(tablename) is called tabdf.rdd
> is having a new id which can be checked by calling tabdf.rdd.id .
> And,
> https://github.com/apache/spark/blob/b6de0c98c70960a97b07615b0b08fb
> d8f900fbe7/core/src/main/scala/org/apache/spark/SparkContext.scala#L268
>
> Spark is maintaining the Map if [RDD_ID,RDD] , as RDD id is changing ,
> will spark cache same data again and again ??
>
> For example ,
>
> val tabdf = sqlContext.table("employee")
> tabdf.cache()
> tabdf.someTransformation.someAction
> println(tabledf.rdd.id)
> val tabdf1 = sqlContext.table("employee")
> tabdf1.cache() <= *Will spark again go to disk read and load data into
> memory or look into cache ?*
> tabdf1.someTransformation.someAction
> println(tabledf1.rdd.id)
>
> Regards,
> R Banerjee
>
>
>
>
> On Fri, Nov 18, 2016 at 9:14 PM, Yong Zhang <java8964@hotmail.com> wrote:
>
>> That's correct, as long as you don't change the StorageLevel.
>>
>>
>> https://github.com/apache/spark/blob/master/core/src/main/
>> scala/org/apache/spark/rdd/RDD.scala#L166
>>
>>
>>
>> Yong
>>
>> ------------------------------
>> *From:* Rabin Banerjee <dev.rabin.banerjee@gmail.com>
>> *Sent:* Friday, November 18, 2016 10:36 AM
>> *To:* user; Mich Talebzadeh; Tathagata Das
>> *Subject:* Will spark cache table once even if I call read/cache on the
>> same table multiple times
>>
>> Hi All ,
>>
>>   I am working in a project where code is divided into multiple reusable
>> module . I am not able to understand spark persist/cache on that context.
>>
>> My Question is Will spark cache table once even if I call read/cache on
>> the same table multiple times ??
>>
>>  Sample Code ::
>>
>>   TableReader::
>>
>>    def getTableDF(tablename:String,persist:Boolean = false) : DataFrame
>> = {
>>          val tabdf = sqlContext.table(tablename)
>>          if(persist) {
>>              tabdf.cache()
>>             }
>>       return tableDF
>> }
>>
>>  Now
>> Module1::
>>  val emp = TableReader.getTable("employee")
>>  emp.someTransformation.someAction
>>
>> Module2::
>>  val emp = TableReader.getTable("employee")
>>  emp.someTransformation.someAction
>>
>> ....
>>
>> ModuleN::
>>  val emp = TableReader.getTable("employee")
>>  emp.someTransformation.someAction
>>
>> Will spark cache emp table once , or it will cache every time I am
>> calling ?? Shall I maintain a global hashmap to handle that ? something
>> like Map[String,DataFrame] ??
>>
>>  Regards,
>> Rabin Banerjee
>>
>>
>>
>>
>


-- 
*___________________*
Quant | Engineer | Boy
*___________________*
*blog*:    http://litaotao.github.io
<http://litaotao.github.io?utm_source=spark_mail>
*github*: www.github.com/litaotao

Mime
View raw message