spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "chillon_m (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-13614) show() trigger memory leak,why?
Date Thu, 03 Mar 2016 02:15:18 GMT

    [ https://issues.apache.org/jira/browse/SPARK-13614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175516#comment-15175516
] 

chillon_m edited comment on SPARK-13614 at 3/3/16 2:14 AM:
-----------------------------------------------------------

@[~srowen]
the same size of dataset(hot.count()=599147,ghot.size=21844),collect don't trigger memory
leak(first image),but show() trigger it.why?in general,collect trigger it easily("Keep in
mind that your entire dataset must fit in memory on a single machine to use collect() on it,
so collect() shouldn’t be used on large datasets." in <learning spark>),but collect
don't trigger.



was (Author: chillon_m):
[~srowen]
the same size of dataset(hot.count()=599147,ghot.size=21844),collect don't trigger memory
leak(first image),but show() trigger it.why?in general,collect trigger it easily("Keep in
mind that your entire dataset must fit in memory on a single machine to use collect() on it,
so collect() shouldn’t be used on large datasets." in <learning spark>),but collect
don't trigger.


> show() trigger memory leak,why?
> -------------------------------
>
>                 Key: SPARK-13614
>                 URL: https://issues.apache.org/jira/browse/SPARK-13614
>             Project: Spark
>          Issue Type: Question
>          Components: SQL
>    Affects Versions: 1.5.2
>            Reporter: chillon_m
>         Attachments: memory leak.png, memory.png
>
>
> hot.count()=599147
> ghot.size=21844
> [bigdata@namenode spark-1.5.2-bin-hadoop2.4]$ bin/spark-shell --driver-class-path /home/bigdata/mysql-connector-java-5.1.38-bin.jar

> Welcome to
>       ____              __
>      / __/__  ___ _____/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 1.5.2
>       /_/
> Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80)
> Type in expressions to have them evaluated.
> Type :help for more information.
> Spark context available as sc.
> SQL context available as sqlContext.
> scala> val hot=sqlContext.read.format("jdbc").options(Map("url" -> "jdbc:mysql://:/?user=&password=","dbtable"
-> "")).load()
> Wed Mar 02 14:22:37 CST 2016 WARN: Establishing SSL connection without server's identity
verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements
SSL connection must be established by default if explicit option isn't set. For compliance
with existing applications not using SSL the verifyServerCertificate property is set to 'false'.
You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and
provide truststore for server certificate verification.
> hot: org.apache.spark.sql.DataFrame = []
> scala> val ghot=hot.groupBy("Num","pNum").count().collect()
> Wed Mar 02 14:22:59 CST 2016 WARN: Establishing SSL connection without server's identity
verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements
SSL connection must be established by default if explicit option isn't set. For compliance
with existing applications not using SSL the verifyServerCertificate property is set to 'false'.
You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and
provide truststore for server certificate verification.
> ghot: Array[org.apache.spark.sql.Row] = Array([[],[],[], [,42310...
> scala> ghot.take(20)
> res0: Array[org.apache.spark.sql.Row] = Array([],[],[],[],[],[],[],[]....)
> scala> hot.groupBy("Num","pNum").count().show()
> Wed Mar 02 14:26:05 CST 2016 WARN: Establishing SSL connection without server's identity
verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements
SSL connection must be established by default if explicit option isn't set. For compliance
with existing applications not using SSL the verifyServerCertificate property is set to 'false'.
You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and
provide truststore for server certificate verification.
> 16/03/02 14:26:33 ERROR Executor: Managed memory leak detected; size = 4194304 bytes,
TID = 202
> +----------+---------+-----+
> |     QQNum| TroopNum|count|
> +----------+---------+-----+
> |1XXXXXXXXX|38XXXXXXX|    1|
> |1XXXXXXXXX| 5XXXXXXX|    2|
> |1XXXXXXXXX|26XXXXXXX|    6|
> |1XXXXXXXXX|14XXXXXXX|    3|
> |1XXXXXXXXX|41XXXXXXX|   14|
> |1XXXXXXXXX|48XXXXXXX|   18|
> |1XXXXXXXXX|23XXXXXXX|    2|
> |1XXXXXXXXX|  XXXXXXX|   34|
> |1XXXXXXXXX|52XXXXXXX|    1|
> |1XXXXXXXXX|52XXXXXXX|    2|
> |1XXXXXXXXX|49XXXXXXX|    3|
> |1XXXXXXXXX|42XXXXXXX|    3|
> |1XXXXXXXXX|17XXXXXXX|   11|
> |1XXXXXXXXX|25XXXXXXX|  129|
> |1XXXXXXXXX|13XXXXXXX|    2|
> |1XXXXXXXXX|19XXXXXXX|    1|
> |1XXXXXXXXX|32XXXXXXX|    9|
> |1XXXXXXXXX|38XXXXXXX|    6|
> |1XXXXXXXXX|38XXXXXXX|   13|
> |1XXXXXXXXX|30XXXXXXX|    4|
> +----------+---------+-----+
> only showing top 20 rows



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message