spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiao Li (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-21579) dropTempView has a critical BUG
Date Tue, 01 Aug 2017 06:20:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-21579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16108440#comment-16108440
] 

Xiao Li commented on SPARK-21579:
---------------------------------

Like what I said in the PR. Correctness is more important for us. The cached plan will be
reused when we build any other plans. Thus, users might see the out-of-dated results.

To achieve what you want, it requires introducing new concept, like materialized views, which
will not be used by our plan matching in query execution.

> dropTempView has a critical BUG
> -------------------------------
>
>                 Key: SPARK-21579
>                 URL: https://issues.apache.org/jira/browse/SPARK-21579
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.1.1, 2.2.0
>            Reporter: ant_nebula
>            Priority: Critical
>         Attachments: screenshot-1.png
>
>
> when I dropTempView dwd_table1 only, sub table dwd_table2 also disappear from http://127.0.0.1:4040/storage/.

> It affect version 2.1.1 and 2.2.0, 2.1.0 is ok for this problem.
> {code:java}
> val spark = SparkSession.builder.master("local").appName("sparkTest").getOrCreate()
> val rows = Seq(Row("p1", 30), Row("p2", 20), Row("p3", 25), Row("p4", 10), Row("p5",
40), Row("p6", 15))
> val schema = new StructType().add(StructField("name", StringType)).add(StructField("age",
IntegerType))
> val rowRDD = spark.sparkContext.parallelize(rows, 3)
> val df = spark.createDataFrame(rowRDD, schema)
> df.createOrReplaceTempView("ods_table")
> spark.sql("cache table ods_table")
> spark.sql("cache table dwd_table1 as select * from ods_table where age>=25")
> spark.sql("cache table dwd_table2 as select * from dwd_table1 where name='p1'")
> spark.catalog.dropTempView("dwd_table1")
> //spark.catalog.dropTempView("ods_table")
> spark.sql("select * from dwd_table2").show()
> {code}
> It will keep ods_table1 in memory, although it will not been used anymore. It waste memory,
especially when my service diagram much more complex
> !screenshot-1.png!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message