spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Saisai Shao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-22062) BlockManager does not account for memory consumed by remote fetches
Date Thu, 12 Oct 2017 01:21:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-22062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16201305#comment-16201305
] 

Saisai Shao commented on SPARK-22062:
-------------------------------------

Yes, there potentially has OOM problem, but I think this kind of temporarily allocated {{ByteBuffer}}
is difficult to be defined as whether it should be accounted into storage memory or execution
memory. Furthermore, how to deal with remote fetching if memory is not enough, shall we fail
the task or can we stream the remote fetches?

What I can think of is to leverage the current implementation of shuffle to spill the large
blocks to local disk during fetching, and tasks can read the data from local temporary files,
this could avoid OOM.

> BlockManager does not account for memory consumed by remote fetches
> -------------------------------------------------------------------
>
>                 Key: SPARK-22062
>                 URL: https://issues.apache.org/jira/browse/SPARK-22062
>             Project: Spark
>          Issue Type: Bug
>          Components: Block Manager
>    Affects Versions: 2.2.0
>            Reporter: Sergei Lebedev
>            Priority: Minor
>
> We use Spark exclusively with {{StorageLevel.DiskOnly}} as our workloads are very sensitive
to memory usage. Recently, we've spotted that the jobs sometimes OOM leaving lots of byte[]
arrays on the heap. Upon further investigation, we've found that the arrays come from {{BlockManager.getRemoteBytes}},
which [calls|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L638]
{{BlockTransferService.fetchBlockSync}}, which in its turn would [allocate|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/network/BlockTransferService.scala#L99]
an on-heap {{ByteBuffer}} of the same size as the block (e.g. full partition), if the block
was successfully retrieved over the network.
> This memory is not accounted towards Spark storage/execution memory and could potentially
lead to OOM if {{BlockManager}} fetches too many partitions in parallel. I wonder if this
is intentional behaviour, or in fact a bug?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message