spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tolstopyatov Vsevolod (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-22625) Properly cleanup inheritable thread-locals
Date Tue, 28 Nov 2017 09:46:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-22625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tolstopyatov Vsevolod updated SPARK-22625:
------------------------------------------
    Description: 
Memory leak is present due to inherited thread locals, SPARK-20558 didn't fixed it properly.

Our production application has the following logic: one thread is reading from HDFS and another
one creates spark context, processes HDFS files and then closes it on regular schedule.

Depending on what thread started first, SparkContext thread local may or may not be inherited
by HDFS-daemon (DataStreamer), causing memory leak when streamer was created after spark context.
Memory consumption increases every time new spark context is created, related yourkit paths:
https://screencast.com/t/tgFBYMEpW
The problem is more general and is not related to HDFS in particular.

Proper fix: register all cloned properties (in `localProperties#childValue`) in ConcurrentHashMap
and forcefully clear all of them in `SparkContext#close`

  was:
Memory leak is present due to inherited thread locals, SPARK-20558 didn't fixed it properly.

Our production application has the following logic: one thread is reading from HDFS and another
one creates spark context, processes HDFS files and then closes it on regular schedule.

Depending on what thread started first, SparkContext thread local may or may not be inherited
by HDFS-daemon (DataStreamer), causing memory leak when streamer was created after spark context.
Memory consumption increases every time new spark context is created, related yourkit paths:
https://screencast.com/t/tgFBYMEpW

Proper fix: register all cloned properties (in `localProperties#childValue`) in ConcurrentHashMap
and forcefully clear all of them in `SparkContext#close`


> Properly cleanup inheritable thread-locals
> ------------------------------------------
>
>                 Key: SPARK-22625
>                 URL: https://issues.apache.org/jira/browse/SPARK-22625
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.2.0
>            Reporter: Tolstopyatov Vsevolod
>              Labels: leak
>
> Memory leak is present due to inherited thread locals, SPARK-20558 didn't fixed it properly.
> Our production application has the following logic: one thread is reading from HDFS and
another one creates spark context, processes HDFS files and then closes it on regular schedule.
> Depending on what thread started first, SparkContext thread local may or may not be inherited
by HDFS-daemon (DataStreamer), causing memory leak when streamer was created after spark context.
Memory consumption increases every time new spark context is created, related yourkit paths:
https://screencast.com/t/tgFBYMEpW
> The problem is more general and is not related to HDFS in particular.
> Proper fix: register all cloned properties (in `localProperties#childValue`) in ConcurrentHashMap
and forcefully clear all of them in `SparkContext#close`



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message