hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (HBASE-900) Regionserver memory leak causing OOME during relatively modest bulk importing
Date Thu, 04 Dec 2008 05:02:44 GMT

    [ https://issues.apache.org/jira/browse/HBASE-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653167#action_12653167
] 

apurtell edited comment on HBASE-900 at 12/3/08 9:01 PM:
---------------------------------------------------------------

Here is a scenario that guarantees a flurry of regionserver OOMEs on my cluster, which is
now running latest trunk on top of Hadoop 0.18.2-dev + Ganglia 3.1 patch:

1) Start up heritrix with hbase-writer. 25 TOEs should do it. Start a long running job.

2) Build up content until there are ~20 regions per regionserver.

3) Run a mapreduce job that walks a metadata column of the content table -- not all columns,
not the family storing the content itself, just some small auxiliary metadata.

4) Simultaneously to the scanning read (#3), perform what amounts to a bulk import with 5
concurrent writers. (Typical for my load is 4-8GB in maybe a few 10K updates.) Specifically
I am using MozillaHtmlParser to build Document objects from text content and am then storing
back serialized representations of those Document objects.

After an invocation of #4, heap usage has balooned across the cluster and it is only a matter
of time. Memcache is within limits and for my configuration represents 25% of heap max (I
run with 2G heap), so the remaining data is something else. Heap histograms from jhat show
a very large number of allocations of [B which can be as much as 1.5GB in total. Soon the
regionservers will start to compact or do other heap intensive activities and will fall over.


A flurry of OOMEs can confuse the master. It will reject region opens thinking they are closing
and the regions will remain offline until a manual restart of the cluster. Disable/enable
of the table only makes that particular wrinkle worse. 

After restart, invariably a number of regions want to (and do) split. 

      was (Author: apurtell):
    Here is a scenario that guarantees a flurry of regionserver OOMEs on my cluster, which
is now running latest trunk on top of Hadoop 0.18.2-dev + Ganglia 3.1 patch:

1) Start up heritrix with hbase-writer. 25 TOEs should do it. Start a long running job.

2) Build up content until there are ~20 regions per regionserver.

3) Run a mapreduce job that walks a metadata column of the content table -- not all rows,
not the family storing the content itself, just some small auxiliary metadata.

4) Simultaneously to the scanning read (#3), perform what amounts to a bulk import with 5
concurrent writers. (Typical for my load is 4-8GB in maybe a few 10K updates.) Specifically
I am using MozillaHtmlParser to build Document objects from text content and am then storing
back serialized representations of those Document objects.

After an invocation of #4, heap usage has balooned across the cluster and it is only a matter
of time. Memcache is within limits and for my configuration represents 25% of heap max (I
run with 2G heap), so the remaining data is something else. Heap histograms from jhat show
a very large number of allocations of [B which can be as much as 1.5GB in total. Soon the
regionservers will start to compact or do other heap intensive activities and will fall over.


A flurry of OOMEs can confuse the master. It will reject region opens thinking they are closing
and the regions will remain offline until a manual restart of the cluster. Disable/enable
of the table only makes that particular wrinkle worse. 

After restart, invariably a number of regions want to (and do) split. 
  
> Regionserver memory leak causing OOME during relatively modest bulk importing
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-900
>                 URL: https://issues.apache.org/jira/browse/HBASE-900
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.18.1, 0.19.0
>            Reporter: Jonathan Gray
>            Assignee: stack
>            Priority: Blocker
>         Attachments: memoryOn13.png
>
>
> I have recreated this issue several times and it appears to have been introduced in 0.2.
> During an import to a single table, memory usage of individual region servers grows w/o
bounds and when set to the default 1GB it will eventually die with OOME.  This has happened
to me as well as Daniel Ploeg on the mailing list.  In my case, I have 10 RS nodes and OOME
happens w/ 1GB heap at only about 30-35 regions per RS.  In previous versions, I have imported
to several hundred regions per RS with default heap size.
> I am able to get past this by increasing the max heap to 2GB.  However, the appearance
of this in newer versions leads me to believe there is now some kind of memory leak happening
in the region servers during import.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message