hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dave Latham (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
Date Mon, 04 Nov 2013 18:11:18 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813072#comment-13813072

Dave Latham commented on HBASE-9865:

Lars, you beat me to the punch on that read failure case.  I'm also not sure why it is the
way it is or how it should be, but noticed the patch had seemed to change it.  Seems like
it's best to replicate all you can for a corrupt log.  JD, any thoughts or more cool stories?

I also agree that #2 is more serious than #1.  However the issue as filed ad described was
targeted at #1.  Lars, what do you think about adding a simple check in ReplicationSource.removeNonReplicableEdits
to trimToSize if more than half the KVs are removed?

A little more background as we've deciphered some behavior on our cluster in case anyone is
curious.  We're running clusters in a pair of data centers, and just migrated one of those
data centers which involved shutting off replication with one cluster and getting it going
with another one.  As part of that process we managed to get some edits stuck in a replication
cycle without realizing it ( HBASE-9888 and HBASE-7709 ).  Because those edits got batched
up with edits from other clusters ( HBASE-9158 ) it created some enormous edits that varied
by position leading to this particular pain.

> WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers
to go OOM
> --------------------------------------------------------------------------------------------------------
>                 Key: HBASE-9865
>                 URL: https://issues.apache.org/jira/browse/HBASE-9865
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.5, 0.95.0
>            Reporter: churro morales
>            Assignee: Lars Hofhansl
>         Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 9865-trunk-v2.txt,
> WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers
to go OOM.
> A little background on this issue.  We noticed that our source replication regionservers
would get into gc storms and sometimes even OOM. 
> We noticed a case where it showed that there were around 25k WALEdits to replicate, each
one with an ArrayList of KeyValues.  The array list had a capacity of around 90k (using 350KB
of heap memory) but had around 6 non null entries.
> When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes
all kv's that are scoped other than local.  
> But in doing so we don't account for the capacity of the ArrayList when determining heapSize
for a WALEdit.  The logic for shipping a batch is whether you have hit a size capacity or
number of entries capacity.  
> Therefore if have a WALEdit with 25k entries and suppose all are removed: 
> The size of the arrayList is 0 (we don't even count the collection's heap size currently)
but the capacity is ignored.
> This will yield a heapSize() of 0 bytes while in the best case it would be at least 100000
bytes (provided you pass initialCapacity and you have 32 bit JVM) 
> I have some ideas on how to address this problem and want to know everyone's thoughts:
> 1. We use a probabalistic counter such as HyperLogLog and create something like:
> 	* class CapacityEstimateArrayList implements ArrayList
> 		** this class overrides all additive methods to update the probabalistic counts
> 		** it includes one additional method called estimateCapacity (we would take estimateCapacity
- size() and fill in sizes for all references)
> 	* Then we can do something like this in WALEdit.heapSize:
> {code}
>   public long heapSize() {
>     long ret = ClassSize.ARRAYLIST;
>     for (KeyValue kv : kvs) {
>       ret += kv.heapSize();
>     }
>     long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size();
>     ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE);
>     if (scopes != null) {
>       ret += ClassSize.TREEMAP;
>       ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY);
>       // TODO this isn't quite right, need help here
>     }
>     return ret;
>   }	
> {code}
> 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally,
and we provide some percentage threshold.  When that threshold is met (50% of the entries
have been removed) we can call kvs.trimToSize()
> 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me
for this) to grab the actual capacity of the list.  Doing something like this:
> {code}
> public int getArrayListCapacity()  {
>     try {
>       Field f = ArrayList.class.getDeclaredField("elementData");
>       f.setAccessible(true);
>       return ((Object[]) f.get(kvs)).length;
>     } catch (Exception e) {
>       log.warn("Exception in trying to get capacity on ArrayList", e);
>       return kvs.size();
>     }
> {code}
> I am partial to (1) using HyperLogLog and creating a CapacityEstimateArrayList, this
is reusable throughout the code for other classes that implement HeapSize which contains ArrayLists.
 The memory footprint is very small and it is very fast.  The issue is that this is an estimate,
although we can configure the precision we most likely always be conservative.  The estimateCapacity
will always be less than the actualCapacity, but it will be close. I think that putting the
logic in removeNonReplicableEdits will work, but this only solves the heapSize problem in
this particular scenario.  Solution 3 is slow and horrible but that gives us the exact answer.
> I would love to hear if anyone else has any other ideas on how to remedy this problem?
 I have code for trunk and 0.94 for all 3 ideas and can provide a patch if the community thinks
any of these approaches is a viable one.

This message was sent by Atlassian JIRA

View raw message