flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhijiang(wangzhijiang999)" <wangzhijiang...@aliyun.com>
Subject 回复:An addition to Netty's memory footprint
Date Fri, 30 Jun 2017 09:59:45 GMT
Based on Kurt's scenario, if the cumulator allocates a big ByteBuf from ByteBufAllocator during
expansion, it is easy to result in creating a new PoolChunk(16M) because of no consistent
memory in current PoolChunks. And this will cause the total used direct memory beyond estimated.

For further explaination:1. Each PoolArena maintains a list of PoolChunks and the PoolChunk
is grouped into different lists based on memory usages.2. Each PoolChunk contains a list of
subpage(8K) which are constructed a complete balanced binary tree for allocating memory easily.3.
When allocating a length memory from ByteBufAllocator, PoolArena will try to loop all the
current internal PoolChunks to find the enough consistent memory. If not found , it will create
a new chunk.
For example, if the memory usage for a chunk is 50%, that means there are 8M room available
for this chunk. If the length of memory allocation is small, this chunk can satisfy in most
cases.But if the length is big like 1M, the remainder 50% space may not satisfy because all
the available subpages are not under the same parent node in the tree.
After the network improvement mentioned in Stephan's FLIP, the direct memory usage by netty
PooledByteBuffer can be largely reduced and under controlled easily.
------------------------------------------------------------------发件人:Kurt Young <kurt@apache.org>发送时间:2017年6月30日(星期五)
15:51收件人:dev <dev@flink.apache.org>; user <user@flink.apache.org>主 题:An
addition to Netty's memory footprint
Ufuk had write up an excellent document about Netty's memory allocation [1] inside Flink,
and i want to add one more note after running some large scale jobs.
The only inaccurate thing about [1] is how much memory will LengthFieldBasedFrameDecoder use. From
our observations, it will cost at most 4M for each physical connection. 
Why(tl;dr): the reason is ByteToMessageDecoder which is the base class of LengthFieldBasedFrameDecoder
used a Cumulator to save the bytes for further decoding. The Cumulator will try to discard
some read bytes to make room in the buffer when channelReadComplete is triggered. In most
cases, channelReadComplete will only be triggered by AbstractNioByteChannel after which has
read "maxMessagesPerRead" times. The default value for maxMessagesPerRead is 16. So in worst
case, the Cumulator will write up to 1M (64K * 16) data. And due to the logic of ByteBuf's discardSomeReadBytes,
the Cumulator will expand to 4M.
We add an option to tune the maxMessagesPerRead, set it to 2 and everything works fine. I
know Stephan is working on network improvements, it will be a good choice to replace the whole
netty pipeline with Flink's own implementation. But I think we will face some similar logics
when implementing, careful about this.
BTW, should we open a jira to add this config?

[1] https://cwiki.apache.org/confluence/display/FLINK/Netty+memory+allocation
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message