spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roberto Mirizzi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-21478) Unpersist a DF also unpersists related DFs
Date Thu, 20 Jul 2017 18:51:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-21478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16095187#comment-16095187
] 

Roberto Mirizzi commented on SPARK-21478:
-----------------------------------------

That's weird you are not able to reproduce it. 
Did you just launch the spark-shell and copied/pasted the above commands? 
I tried on Spark 2.1.0, 2.1.1 and 2.2.0, both on AWS and on my local machine. 
Spark 2.1.0 doesn't exhibit any issue, Spark 2.1.1 and 2.2.0 fail the last assertion.

About your point "I do think one wants to be able to persist the result and not the original
though", it depends on the specific use case. My example was a trivial one to reproduce the
problem, but as you can imagine you may want to persist the first DF, do a bunch of operations
reusing it, and then generate a new DF, persist it, and unpersist the old one when you don't
need it anymore.
It looks like a serious problem to me. 

This is my entire output:

{code:java}
$ spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/07/20 11:44:40 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform...
using builtin-java classes where applicable
17/07/20 11:44:44 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at http://10.15.16.46:4040
Spark context available as 'sc' (master = local[*], app id = local-1500576281010).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.2.0
      /_/
         
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_71)
Type in expressions to have them evaluated.
Type :help for more information.

scala> val x1 = Seq(1).toDF()
x1: org.apache.spark.sql.DataFrame = [value: int]

scala> x1.persist()
res0: x1.type = [value: int]

scala> x1.count()
res1: Long = 1

scala> assert(x1.storageLevel.useMemory)

scala> 

scala> val x11 = x1.select($"value" * 2)
x11: org.apache.spark.sql.DataFrame = [(value * 2): int]

scala> x11.persist()
res3: x11.type = [(value * 2): int]

scala> x11.count()
res4: Long = 1

scala> assert(x11.storageLevel.useMemory)

scala> 

scala> x1.unpersist()
res6: x1.type = [value: int]

scala> 

scala> assert(!x1.storageLevel.useMemory)

scala> //the following assertion FAILS

scala> assert(x11.storageLevel.useMemory)
java.lang.AssertionError: assertion failed
  at scala.Predef$.assert(Predef.scala:156)
  ... 48 elided

scala> 
{code}


> Unpersist a DF also unpersists related DFs
> ------------------------------------------
>
>                 Key: SPARK-21478
>                 URL: https://issues.apache.org/jira/browse/SPARK-21478
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.1, 2.2.0
>            Reporter: Roberto Mirizzi
>
> Starting with Spark 2.1.1 I observed this bug. Here's are the steps to reproduce it:
> # create a DF
> # persist it
> # count the items in it
> # create a new DF as a transformation of the previous one
> # persist it
> # count the items in it
> # unpersist the first DF
> Once you do that you will see that also the 2nd DF is gone.
> The code to reproduce it is:
> {code:java}
> val x1 = Seq(1).toDF()
> x1.persist()
> x1.count()
> assert(x1.storageLevel.useMemory)
> val x11 = x1.select($"value" * 2)
> x11.persist()
> x11.count()
> assert(x11.storageLevel.useMemory)
> x1.unpersist()
> assert(!x1.storageLevel.useMemory)
> //the following assertion FAILS
> assert(x11.storageLevel.useMemory)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message