spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-16333) Excessive Spark history event/json data size (5GB each)
Date Sun, 03 Jul 2016 11:34:11 GMT

    [ https://issues.apache.org/jira/browse/SPARK-16333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15360500#comment-15360500
] 

Sean Owen commented on SPARK-16333:
-----------------------------------

[~cloud_fan] [~andrewor14] I wonder if you would know anything about this. I'm trying to figure
out how so many block status elements may have gotten into the JSON metrics in 2.x. I traced
this back to JsonProtocol, and although it doesn't look like you all added these elements,
you might have touched related code and so know if this looks normal or what?

> Excessive Spark history event/json data size (5GB each)
> -------------------------------------------------------
>
>                 Key: SPARK-16333
>                 URL: https://issues.apache.org/jira/browse/SPARK-16333
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.0.0
>         Environment: this is seen on both x86 (Intel(R) Xeon(R), E5-2699 ) and ppc platform
(Habanero, Model: 8348-21C), Red Hat Enterprise Linux Server release 7.2 (Maipo)., Spark2.0.0-preview
(May-24, 2016 build)
>            Reporter: Peter Liu
>              Labels: performance, spark2.0.0
>
> With Spark2.0.0-preview (May-24 build), the history event data (the json file), that
is generated for each Spark application run (see below), can be as big as 5GB (instead of
14 MB for exactly the same application run and the same input data of 1TB under Spark1.6.1)
> -rwxrwx--- 1 root root 5.3G Jun 30 09:39 app-20160630091959-0000
> -rwxrwx--- 1 root root 5.3G Jun 30 09:56 app-20160630094213-0000
> -rwxrwx--- 1 root root 5.3G Jun 30 10:13 app-20160630095856-0000
> -rwxrwx--- 1 root root 5.3G Jun 30 10:30 app-20160630101556-0000
> The test is done with Sparkbench V2, SQL RDD (see github: https://github.com/SparkTC/spark-bench)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message