spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jie Xiong (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-18208) Executor OOM due to a memory leak in BytesToBytesMap
Date Tue, 01 Nov 2016 21:28:59 GMT

     [ https://issues.apache.org/jira/browse/SPARK-18208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jie Xiong updated SPARK-18208:
------------------------------
    Description: 
While running a Spark job, we see that the job fails because of executor OOM with following
stack trace -

{quote}
         java.lang.OutOfMemoryError: No enough memory for aggregation
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys1$(Unknown
Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys$(Unknown
Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
	at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:161)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
	at org.apache.spark.scheduler.Task.run(Task.scala:86)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
{quote}

The code is trying to reuse the BytesToBytesMap after spilling by calling the reset function
(see - https://github.com/facebook/FB-Spark/blob/fb-2.0/core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java#L897).
The reset function is releasing all memory pages, but its not reseting the pointer array.
If the pointer array size has grown beyond the fair share, the BytesToBytes map is not being
allocated any memory page further and hence the OOM

  was:
While running a Spark job, we see that the job fails because of executor OOM with following
stack trace -

         java.lang.OutOfMemoryError: No enough memory for aggregation
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys1$(Unknown
Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys$(Unknown
Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
	at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:161)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
	at org.apache.spark.scheduler.Task.run(Task.scala:86)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

The code is trying to reuse the BytesToBytesMap after spilling by calling the reset function
(see - https://github.com/facebook/FB-Spark/blob/fb-2.0/core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java#L897).
The reset function is releasing all memory pages, but its not reseting the pointer array.
If the pointer array size has grown beyond the fair share, the BytesToBytes map is not being
allocated any memory page further and hence the OOM


> Executor OOM due to a memory leak in BytesToBytesMap
> ----------------------------------------------------
>
>                 Key: SPARK-18208
>                 URL: https://issues.apache.org/jira/browse/SPARK-18208
>             Project: Spark
>          Issue Type: Bug
>          Components: Shuffle
>    Affects Versions: 2.0.0
>            Reporter: Jie Xiong
>            Priority: Blocker
>
> While running a Spark job, we see that the job fails because of executor OOM with following
stack trace -
> {quote}
>          java.lang.OutOfMemoryError: No enough memory for aggregation
> 	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys1$(Unknown
Source)
> 	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys$(Unknown
Source)
> 	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
Source)
> 	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> 	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
> 	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> 	at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:161)
> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
> 	at org.apache.spark.scheduler.Task.run(Task.scala:86)
> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> {quote}
> The code is trying to reuse the BytesToBytesMap after spilling by calling the reset function
(see - https://github.com/facebook/FB-Spark/blob/fb-2.0/core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java#L897).
The reset function is releasing all memory pages, but its not reseting the pointer array.
If the pointer array size has grown beyond the fair share, the BytesToBytes map is not being
allocated any memory page further and hence the OOM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message