hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-900) Regionserver memory leak causing OOME during relatively modest bulk importing
Date Fri, 05 Dec 2008 04:38:44 GMT

    [ https://issues.apache.org/jira/browse/HBASE-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653623#action_12653623
] 

stack commented on HBASE-900:
-----------------------------

Here is one theory.  Looking at a heap that OOME'd here on test cluster using jprofiler, there
were a bunch of instances of SoftValue (30 or 40k).   I was able to sort them by deep size
and most encountered held byte arrays of 16k in size.  This would seem to indicate elements
of the blockcache.  Odd thing is that you'd think the SoftValues shouldn't be in the heap
on OOME; they should have been cleared by the GCor.  Looking, each store file instance has
a Map of SoftValues.  They are keyed by position into the file.  The GC does the job of moving
the blocks that are to be cleared onto a ReferenceQueue but unless the ReferenceQueue gets
processed promptly, we'll hold on to the SoftValue references (JProfiler has a button which
says 'clean References' and after selecting this, the SoftValues remained).  The ReferenceQueue
gets processed when we add a new block to the cache or if we seek to a new location in a block
that we got from the cache (only).  Otherwise, blocks to be removed are not processed.  If
random-reading or only looking at certain stores in a regionserver, all other storefiles,
unless they are accessed, will continue to hold on to blocks via their uncleared ReferenceQueue.

I tried adding in check of the ReferenceQueue everytime anything was accessed on a file but
I still OOME'd using a random read test.

Next thing to try is a single Map that holds all blockcache entries.  Will be lots of contention
on this single Map but better than going to disk any day.  All accesses will check the ReferenceQueue.

Only downer is that Tim Sell says his last test was run without blockcache enabled and that
it made no difference.  Maybe try it Andrew?

Meantime, I'll try the above suggestion.  Andrew, any chance of a copy of your heap dump?
 Tim the same?

> Regionserver memory leak causing OOME during relatively modest bulk importing
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-900
>                 URL: https://issues.apache.org/jira/browse/HBASE-900
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.18.1, 0.19.0
>            Reporter: Jonathan Gray
>            Assignee: stack
>            Priority: Blocker
>         Attachments: memoryOn13.png
>
>
> I have recreated this issue several times and it appears to have been introduced in 0.2.
> During an import to a single table, memory usage of individual region servers grows w/o
bounds and when set to the default 1GB it will eventually die with OOME.  This has happened
to me as well as Daniel Ploeg on the mailing list.  In my case, I have 10 RS nodes and OOME
happens w/ 1GB heap at only about 30-35 regions per RS.  In previous versions, I have imported
to several hundred regions per RS with default heap size.
> I am able to get past this by increasing the max heap to 2GB.  However, the appearance
of this in newer versions leads me to believe there is now some kind of memory leak happening
in the region servers during import.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message