spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Imran Rashid (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-24801) Empty byte[] arrays in spark.network.sasl.SaslEncryption$EncryptedMessage can waste a lot of memory
Date Mon, 16 Jul 2018 19:54:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-24801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545676#comment-16545676
] 

Imran Rashid commented on SPARK-24801:
--------------------------------------

I'm surprised there are so many {{EncryptedMessage}} objects sitting around.  Are there 40583
of them?  that sounds like an extremely overloaded shuffle service -- or a leak.  You're proposal
would probably help some in that case, but really there is probably something else we should
be doing differently.

> Empty byte[] arrays in spark.network.sasl.SaslEncryption$EncryptedMessage can waste a
lot of memory
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-24801
>                 URL: https://issues.apache.org/jira/browse/SPARK-24801
>             Project: Spark
>          Issue Type: Improvement
>          Components: YARN
>    Affects Versions: 2.3.0
>            Reporter: Misha Dmitriev
>            Priority: Major
>
> I recently analyzed another Yarn NM heap dump with jxray ([www.jxray.com),|http://www.jxray.com),/]
and found that 81% of memory is wasted by empty (all zeroes) byte[] arrays. Most of these
arrays are referenced by {{org.apache.spark.network.util.ByteArrayWritableChannel.data}},
and these in turn come from {{spark.network.sasl.SaslEncryption$EncryptedMessage.byteChannel}}.
Here is the full reference chain that leads to the problematic arrays:
> {code:java}
> 2,597,946K (64.1%): byte[]: 40583 / 100% of empty 2,597,946K (64.1%)
> ↖org.apache.spark.network.util.ByteArrayWritableChannel.data
> ↖org.apache.spark.network.sasl.SaslEncryption$EncryptedMessage.byteChannel
> ↖io.netty.channel.ChannelOutboundBuffer$Entry.msg
> ↖io.netty.channel.ChannelOutboundBuffer$Entry.{next}
> ↖io.netty.channel.ChannelOutboundBuffer.flushedEntry
> ↖io.netty.channel.socket.nio.NioSocketChannel$NioSocketChannelUnsafe.outboundBuffer
> ↖io.netty.channel.socket.nio.NioSocketChannel.unsafe
> ↖org.apache.spark.network.server.OneForOneStreamManager$StreamState.associatedChannel
> ↖{java.util.concurrent.ConcurrentHashMap}.values
> ↖org.apache.spark.network.server.OneForOneStreamManager.streams
> ↖org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.streamManager
> ↖org.apache.spark.network.yarn.YarnShuffleService.blockHandler
> ↖Java Static org.apache.spark.network.yarn.YarnShuffleService.instance{code}
>  
> Checking the code of {{SaslEncryption$EncryptedMessage}}, I see that byteChannel is always
initialized eagerly in the constructor:
> {code:java}
> this.byteChannel = new ByteArrayWritableChannel(maxOutboundBlockSize);{code}
> So I think to address the problem of empty byte[] arrays flooding the memory, we should
initialize {{byteChannel}} lazily, upon the first use. As far as I can see, it's used only
in one method, {{private void nextChunk()}}.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message