spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Antony Mayi (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-12511) streaming driver with checkpointing unable to finalize leading to OOM
Date Wed, 23 Dec 2015 22:26:46 GMT

     [ https://issues.apache.org/jira/browse/SPARK-12511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Antony Mayi updated SPARK-12511:
--------------------------------
    Attachment: finalizer-spark_assembly.png
                finalizer-pending.png
                finalizer-classes.png

> streaming driver with checkpointing unable to finalize leading to OOM
> ---------------------------------------------------------------------
>
>                 Key: SPARK-12511
>                 URL: https://issues.apache.org/jira/browse/SPARK-12511
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.5.2
>         Environment: pyspark 1.5.2
> yarn 2.6.0
> python 2.6
> centos 6.5
> openjdk 1.8.0
>            Reporter: Antony Mayi
>            Priority: Critical
>         Attachments: finalizer-classes.png, finalizer-pending.png, finalizer-spark_assembly.png
>
>
> Spark streaming application when configured with checkpointing is filling driver's heap
with multiple ZipFileInputStream instances as results of spark-assembly.jar (potentially some
others like for example snappy-java.jar) getting repetitively referenced (loaded?). Java Finalizer
can't finalize these ZipFileInputStream instances and it eventually takes all heap leading
the driver to OOM crash.
> h2. Steps to reproduce:
> * Submit attached bug.py to spark
> * Leave it running and monitor the driver java process heap
> ** with heap dump you will primarily see growing instances of byte array data (here cumulating
zip payload of the jar refs):
> {noformat}
>  num     #instances         #bytes  class name
> ----------------------------------------------
>    1:         32653       32735296  [B
>    2:         48000        5135816  [C
>    3:            41        1344144  [Lscala.concurrent.forkjoin.ForkJoinTask;
>    4:         11362        1261816  java.lang.Class
>    5:         47054        1129296  java.lang.String
>    6:         25460        1018400  java.lang.ref.Finalizer
>    7:          9802         789400  [Ljava.lang.Object;
> {noformat}
> ** with virtualvm you can see:
> *** increasing number of objects pending for finalization
> *** increasing number of ZipFileInputStreams instances related to the spark-assembly.jar
referenced by Finalizer
> * Depending on the heap size and running time this will lead to driver OOM crash
> h2. Comments
> * The bug.py is lightweight proof of the problem. In production I am experiencing this
as quite rapid effect - in few hours it eats gigs of heap and kills the app.
> * If the same bug.py is run without checkpointing there is no issue whatsoever.
> * Not sure if it is just pyspark related.
> * In bug.py I am using the socketTextStream input but seems to be independent of the
input type (in production having same problem with Kafka direct stream, have seen it even
with textFileStream).
> * It is happening even if the input stream doesn't produce any data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message