hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allan Yang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-21031) Memory leak if replay edits failed during region opening
Date Thu, 09 Aug 2018 12:54:00 GMT
Allan Yang created HBASE-21031:
----------------------------------

             Summary: Memory leak if replay edits failed during region opening
                 Key: HBASE-21031
                 URL: https://issues.apache.org/jira/browse/HBASE-21031
             Project: HBase
          Issue Type: Bug
    Affects Versions: 2.0.1, 2.1.0
            Reporter: Allan Yang
            Assignee: Allan Yang


Due to HBASE-21029, when replaying edits with a lot of same cells, the memstore won't flush,
 a exception will throw when all heap space was used:
{code}
2018-08-06 15:52:27,590 ERROR [RS_OPEN_REGION-regionserver/hb-bp10cw4ejoy0a2f3f-009:16020-2]
handler.OpenRegionHandler(302): Failed open of region=hbase_test,dffa78,1531227033378.cbf9a2daf3aaa0c7e931e9c9a7b53f41.,
starting to roll back the global memstore size.
java.lang.OutOfMemoryError: Java heap space
        at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
        at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
        at org.apache.hadoop.hbase.regionserver.OnheapChunk.allocateDataBuffer(OnheapChunk.java:41)
        at org.apache.hadoop.hbase.regionserver.Chunk.init(Chunk.java:104)
        at org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:226)
        at org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:180)
        at org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:163)
        at org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.getOrMakeChunk(MemStoreLABImpl.java:273)
        at org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:148)
        at org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:111)
        at org.apache.hadoop.hbase.regionserver.Segment.maybeCloneWithAllocator(Segment.java:178)
        at org.apache.hadoop.hbase.regionserver.AbstractMemStore.maybeCloneWithAllocator(AbstractMemStore.java:287)
        at org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:107)
        at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:706)
        at org.apache.hadoop.hbase.regionserver.HRegion.restoreEdit(HRegion.java:5494)
        at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:4608)
        at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:4404)
{code}
After this exception, the memstore did not roll back, and since MSLAB is used, all the chunk
allocated won't release for ever. Those memory is leak forever...

We need to rollback the memory if open region fails(For now, only global memstore size is
decreased after failure).

Another problem is that we use replayEditsPerRegion in RegionServerAccounting to record how
many memory used during replaying. And decrease the global memstore size if replay fails.
This is not right, since during replaying, we may also flush the memstore, the size in the
map of replayEditsPerRegion is not accurate at all! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message