Mailing-List: contact issues-help@spark.apache.org; run by ezmlm
Precedence: bulk
Date: Thu, 3 Mar 2016 02:15:18 +0000 (UTC)
From: "chillon_m (JIRA)" <jira@apache.org>
To: issues@spark.apache.org
Message-ID: <JIRA.12946066.1456901302000.203265.1456971318623@Atlassian.JIRA>
In-Reply-To: <JIRA.12946066.1456901302000@Atlassian.JIRA>
References: <JIRA.12946066.1456901302000@Atlassian.JIRA>
 <JIRA.12946066.1456901302644@arcas>
Subject: [jira] [Comment Edited] (SPARK-13614) show() trigger memory
 leak,why?
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/SPARK-13614?page=3Dcom.atlassia=
n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D151=
75516#comment-15175516 ]=20

chillon_m edited comment on SPARK-13614 at 3/3/16 2:14 AM:
-----------------------------------------------------------

@[~srowen]
the same size of dataset(hot.count()=3D599147,ghot.size=3D21844),collect do=
n't trigger memory leak(first image),but show() trigger it.why?in general,c=
ollect trigger it easily("Keep in mind that your entire dataset must fit in=
 memory on a single machine to use collect() on it, so collect() shouldn=E2=
=80=99t be used on large datasets." in <learning spark>),but collect don't =
trigger.


was (Author: chillon_m):
[~srowen]
the same size of dataset(hot.count()=3D599147,ghot.size=3D21844),collect do=
n't trigger memory leak(first image),but show() trigger it.why?in general,c=
ollect trigger it easily("Keep in mind that your entire dataset must fit in=
 memory on a single machine to use collect() on it, so collect() shouldn=E2=
=80=99t be used on large datasets." in <learning spark>),but collect don't =
trigger.


> show() trigger memory leak,why?
> -------------------------------
>
>                 Key: SPARK-13614
>                 URL: https://issues.apache.org/jira/browse/SPARK-13614
>             Project: Spark
>          Issue Type: Question
>          Components: SQL
>    Affects Versions: 1.5.2
>            Reporter: chillon_m
>         Attachments: memory leak.png, memory.png
>
>
> hot.count()=3D599147
> ghot.size=3D21844
> [bigdata@namenode spark-1.5.2-bin-hadoop2.4]$ bin/spark-shell --driver-cl=
ass-path /home/bigdata/mysql-connector-java-5.1.38-bin.jar=20
> Welcome to
>       ____              __
>      / __/__  ___ _____/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 1.5.2
>       /_/
> Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0=
_80)
> Type in expressions to have them evaluated.
> Type :help for more information.
> Spark context available as sc.
> SQL context available as sqlContext.
> scala> val hot=3DsqlContext.read.format("jdbc").options(Map("url" -> "jdb=
c:mysql://:/?user=3D&password=3D","dbtable" -> "")).load()
> Wed Mar 02 14:22:37 CST 2016 WARN: Establishing SSL connection without se=
rver's identity verification is not recommended. According to MySQL 5.5.45+=
, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by def=
ault if explicit option isn't set. For compliance with existing application=
s not using SSL the verifyServerCertificate property is set to 'false'. You=
 need either to explicitly disable SSL by setting useSSL=3Dfalse, or set us=
eSSL=3Dtrue and provide truststore for server certificate verification.
> hot: org.apache.spark.sql.DataFrame =3D []
> scala> val ghot=3Dhot.groupBy("Num","pNum").count().collect()
> Wed Mar 02 14:22:59 CST 2016 WARN: Establishing SSL connection without se=
rver's identity verification is not recommended. According to MySQL 5.5.45+=
, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by def=
ault if explicit option isn't set. For compliance with existing application=
s not using SSL the verifyServerCertificate property is set to 'false'. You=
 need either to explicitly disable SSL by setting useSSL=3Dfalse, or set us=
eSSL=3Dtrue and provide truststore for server certificate verification.
> ghot: Array[org.apache.spark.sql.Row] =3D Array([[],[],[], [,42310...
> scala> ghot.take(20)
> res0: Array[org.apache.spark.sql.Row] =3D Array([],[],[],[],[],[],[],[]..=
..)
> scala> hot.groupBy("Num","pNum").count().show()
> Wed Mar 02 14:26:05 CST 2016 WARN: Establishing SSL connection without se=
rver's identity verification is not recommended. According to MySQL 5.5.45+=
, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by def=
ault if explicit option isn't set. For compliance with existing application=
s not using SSL the verifyServerCertificate property is set to 'false'. You=
 need either to explicitly disable SSL by setting useSSL=3Dfalse, or set us=
eSSL=3Dtrue and provide truststore for server certificate verification.
> 16/03/02 14:26:33 ERROR Executor: Managed memory leak detected; size =3D =
4194304 bytes, TID =3D 202
> +----------+---------+-----+
> |     QQNum| TroopNum|count|
> +----------+---------+-----+
> |1XXXXXXXXX|38XXXXXXX|    1|
> |1XXXXXXXXX| 5XXXXXXX|    2|
> |1XXXXXXXXX|26XXXXXXX|    6|
> |1XXXXXXXXX|14XXXXXXX|    3|
> |1XXXXXXXXX|41XXXXXXX|   14|
> |1XXXXXXXXX|48XXXXXXX|   18|
> |1XXXXXXXXX|23XXXXXXX|    2|
> |1XXXXXXXXX|  XXXXXXX|   34|
> |1XXXXXXXXX|52XXXXXXX|    1|
> |1XXXXXXXXX|52XXXXXXX|    2|
> |1XXXXXXXXX|49XXXXXXX|    3|
> |1XXXXXXXXX|42XXXXXXX|    3|
> |1XXXXXXXXX|17XXXXXXX|   11|
> |1XXXXXXXXX|25XXXXXXX|  129|
> |1XXXXXXXXX|13XXXXXXX|    2|
> |1XXXXXXXXX|19XXXXXXX|    1|
> |1XXXXXXXXX|32XXXXXXX|    9|
> |1XXXXXXXXX|38XXXXXXX|    6|
> |1XXXXXXXXX|38XXXXXXX|   13|
> |1XXXXXXXXX|30XXXXXXX|    4|
> +----------+---------+-----+
> only showing top 20 rows


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org