spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Khaitman (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-5395) Large number of Python workers causing resource depletion
Date Tue, 27 Jan 2015 16:27:34 GMT

    [ https://issues.apache.org/jira/browse/SPARK-5395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293653#comment-14293653
] 

Mark Khaitman edited comment on SPARK-5395 at 1/27/15 4:26 PM:
---------------------------------------------------------------

Actually I think I know why this happens... I'm thinking the problem really occurs due to
the way auto-persistence of specific actions occur. 

ReduceByKey, GroupByKey, cogroup, etc, are typically heavy actions that get auto-persisted
for the reason that the resulting RDD's will most likely be used for something right after.


The interesting thing is that this memory is outside of the executor memory for the framework
(it's what goes into these pyspark daemons that get spawned up temporarily). The other interesting
fact is that let's say we leave the default python worker memory set to 512MB, and you have
a framework that uses 8 cores on each executor, it spawns up 8 * 512MB (4GB) of python workers
while the stage is running. 

[~skrasser] In your case, if you chain a bunch of auto-persisting actions (which I believe
coalesce is a part of, since instead of dealing with a shuffle read, it instead builds a potentially
large array of partitions on the executor), it will spawn an additional 2 python workers per
executor for that separate task, while the previous tasks' python workers are left in a sleeping
state, waiting for the results of the subsequent task to complete... 

If that's the case, then it should actually be a bit easier showing how a single framework
can nuke a single host by creating a crazy chain of coalescing/reduceByKey/GroupByKey/cogrouping
actions (which I'm off to try out now haha)

EDIT: I'm almost positive this is what's causing this to occur now. Unfortunately there is
no easy way to prevent a single framework from wiping out all of the memory on a single box
if it does a huge amount of shuffle writing, with the combination of auto-persisting and chained
RDD actions which depend on previous RDD computations... You COULD break up the chain by forcing
an intermediate step to DISK using (saveAsPickleFile/saveAsTextFile perhaps), and then in
the next step reading it back in. At least that would force the previous python worker daemons
to be cleaned up before potentially spawning new ones...

Ideally there should be an environment variable for the max number of python workers allowed
to be spawned per executor, because it looks like that doesn't exist as of yet! 


was (Author: mkman84):
Actually I think I know why this happens... I'm thinking the problem really occurs due to
the way auto-persistence of specific actions occur. 

ReduceByKey, GroupByKey, cogroup, etc, are typically heavy actions that get auto-persisted
for the reason that the resulting RDD's will most likely be used for something right after.


The interesting thing is that this memory is outside of the executor memory for the framework
(it's what goes into these pyspark daemons that get spawned up temporarily). The other interesting
fact is that let's say we leave the default python worker memory set to 512MB, and you have
a framework that uses 8 cores on each executor, it spawns up 8 * 512MB (4GB) of python workers
while the stage is running. 

[~skrasser] In your case, if you chain a bunch of auto-persisting actions (which I believe
coalesce is a part of, since instead of dealing with a shuffle read, it instead builds a potentially
large array of partitions on the executor), it will spawn an additional 2 python workers per
executor for that separate task, while the previous tasks' python workers are left in a sleeping
state, waiting for the results of the subsequent task to complete... 

If that's the case, then it should actually be a bit easier showing how a single framework
can nuke a single host by creating a crazy chain of coalescing/reduceByKey/GroupByKey/cogrouping
actions (which I'm off to try out now haha)

> Large number of Python workers causing resource depletion
> ---------------------------------------------------------
>
>                 Key: SPARK-5395
>                 URL: https://issues.apache.org/jira/browse/SPARK-5395
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.2.0
>         Environment: AWS ElasticMapReduce
>            Reporter: Sven Krasser
>
> During job execution a large number of Python worker accumulates eventually causing YARN
to kill containers for being over their memory allocation (in the case below that is about
8G for executors plus 6G for overhead per container). 
> In this instance, at the time of killing the container 97 pyspark.daemon processes had
accumulated.
> {noformat}
> 2015-01-23 15:36:53,654 INFO [Reporter] yarn.YarnAllocationHandler (Logging.scala:logInfo(59))
- Container marked as failed: container_1421692415636_0052_01_000030. Exit status: 143. Diagnostics:
Container [pid=35211,containerID=container_1421692415636_0052_01_000030] is running beyond
physical memory limits. Current usage: 14.9 GB of 14.5 GB physical memory used; 41.3 GB of
72.5 GB virtual memory used. Killing container.
> Dump of the process-tree for container_1421692415636_0052_01_000030 :
> |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES)
RSSMEM_USAGE(PAGES) FULL_CMD_LINE
> |- 54101 36625 36625 35211 (python) 78 1 332730368 16834 python -m pyspark.daemon
> |- 52140 36625 36625 35211 (python) 58 1 332730368 16837 python -m pyspark.daemon
> |- 36625 35228 36625 35211 (python) 65 604 331685888 17694 python -m pyspark.daemon
> 	[...]
> {noformat}
> The configuration used uses 64 containers with 2 cores each.
> Full output here: https://gist.github.com/skrasser/e3e2ee8dede5ef6b082c
> Mailinglist discussion: https://www.mail-archive.com/user@spark.apache.org/msg20102.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message