[ https://issues.apache.org/jira/browse/HDFS-9917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15186372#comment-15186372
]
Brahma Reddy Battula commented on HDFS-9917:
--------------------------------------------
*Following tested on trunk,not the original cluster data.*
{noformat}
After stopping the SNN:
=======================
BLR1000006554:/opt/Trunk/hadoop/bin # jmap -histo:live 34458 | tee dnheap.log | grep -i org.apache.hadoop.hdfs.server.protocol.ReceivedDeletedBlockInfo
19: 2801 67224 org.apache.hadoop.hdfs.server.protocol.ReceivedDeletedBlockInfo
852: 3 72 org.apache.hadoop.hdfs.server.protocol.ReceivedDeletedBlockInfo$BlockStatus
1234: 1 32 [Lorg.apache.hadoop.hdfs.server.protocol.ReceivedDeletedBlockInfo$BlockStatus;
After 10 mins where I am just writing the files:
================================================
BLR1000006554:/opt/Trunk/hadoop/bin # jmap -histo:live 34458 | tee dnheap.log | grep -i org.apache.hadoop.hdfs.server.protocol.ReceivedDeletedBlockInfo
5: 73957 1774968 org.apache.hadoop.hdfs.server.protocol.ReceivedDeletedBlockInfo
852: 3 72 org.apache.hadoop.hdfs.server.protocol.ReceivedDeletedBlockInfo$BlockStatus
1234: 1 32 [Lorg.apache.hadoop.hdfs.server.protocol.ReceivedDeletedBlockInfo$BlockStatus;
After Restart of SNN:
====================
BLR1000006554:/opt/Trunk/hadoop/bin # jmap -histo:live 34458 | tee dnheap.log | grep -i org.apache.hadoop.hdfs.server.protocol.ReceivedDeletedBlockInfo
848: 3 72 org.apache.hadoop.hdfs.server.protocol.ReceivedDeletedBlockInfo$BlockStatus
1237: 1 32 [Lorg.apache.hadoop.hdfs.server.protocol.ReceivedDeletedBlockInfo$BlockStatus;
{noformat}
> IBR accumulate more objects when SNN was down for sometime.
> -----------------------------------------------------------
>
> Key: HDFS-9917
> URL: https://issues.apache.org/jira/browse/HDFS-9917
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Brahma Reddy Battula
> Assignee: Brahma Reddy Battula
>
> SNN was down for sometime because of some reasons..After restarting SNN,it became unreponsive
because
> - 29 DN's sending IBR in each 5 million ( most of them are delete IBRs), where as each
datanode had only ~2.5 million blocks.
> - GC can't trigger on this objects since all will be under RPC queue.
> To recover this( to clear this objects) ,restarted all the DN's one by one..This issue
happened in 2.4.1 where split of blockreport was not available.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
|