hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16698) Performance issue: handlers stuck waiting for CountDownLatch inside WALKey#getWriteEntry under high writing workload
Date Fri, 14 Oct 2016 00:29:21 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15573654#comment-15573654
] 

stack commented on HBASE-16698:
-------------------------------

Committed below addendum to address this FindBugs complaint:


Code	Warning
UL	org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion$BatchOperationInProgress)
does not release lock on all paths
Bug type UL_UNRELEASED_LOCK (click for details) 
In class org.apache.hadoop.hbase.regionserver.HRegion
In method org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion$BatchOperationInProgress)
At HRegion.java:[line 3313]


stack-MBP:hbase stack$ git show -1
commit e1923b7c0c14b435ea0d9eb306d968f1927a0c6e
Author: Michael Stack <stack@apache.org>
Date:   Thu Oct 13 17:16:47 2016 -0700

    HBASE-16698 Performance issue: handlers stuck waiting for CountDownLatch inside WALKey#getWriteEntry
under high writing workload; ADDENDUM. Fix findbugs

diff --git a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
index 3715ca1..a486599 100644
--- a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
+++ b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
@@ -3310,7 +3310,6 @@ public class HRegion implements HeapSize, PropagatingConfigurationObserver,
Regi
         this.mvcc.advanceTo(batchOp.getReplaySequenceId());
       } else {
         // writeEntry won't be empty if not in replay mode
-        assert writeEntry != null;
         mvcc.completeAndWait(writeEntry);
         writeEntry = null;
       }

@appy kicked me for committing w/ a FindBugs....  (Thanks @appy)


> Performance issue: handlers stuck waiting for CountDownLatch inside WALKey#getWriteEntry
under high writing workload
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-16698
>                 URL: https://issues.apache.org/jira/browse/HBASE-16698
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance
>    Affects Versions: 1.1.6, 1.2.3
>            Reporter: Yu Li
>            Assignee: Yu Li
>             Fix For: 2.0.0
>
>         Attachments: HBASE-16698.branch-1.patch, HBASE-16698.patch, HBASE-16698.v2.patch,
hadoop0495.et2.jstack
>
>
> As titled, on our production environment we observed 98 out of 128 handlers get stuck
waiting for the CountDownLatch {{seqNumAssignedLatch}} inside {{WALKey#getWriteEntry}} under
a high writing workload.
> After digging into the problem, we found that the problem is mainly caused by advancing
mvcc in the append logic. Below is some detailed analysis:
> Under current branch-1 code logic, all batch puts will call {{WALKey#getWriteEntry}}
after appending edit to WAL, and {{seqNumAssignedLatch}} is only released when the relative
append call is handled by RingBufferEventHandler (see {{FSWALEntry#stampRegionSequenceId}}).
Because currently we're using a single event handler for the ringbuffer, the append calls
are handled one by one (actually lot's of our current logic depending on this sequential dealing
logic), and this becomes a bottleneck under high writing workload.
> The worst part is that by default we only use one WAL per RS, so appends on all regions
are dealt with in sequential, which causes contention among different regions...
> To fix this, we could also take use of the "sequential appends" mechanism, that we could
grab the WriteEntry before publishing append onto ringbuffer and use it as sequence id, only
that we need to add a lock to make "grab WriteEntry" and "append edit" a transaction. This
will still cause contention inside a region but could avoid contention between different regions.
This solution is already verified in our online environment and proved to be effective.
> Notice that for master (2.0) branch since we already change the write pipeline to sync
before writing memstore (HBASE-15158), this issue only exists for the ASYNC_WAL writes scenario.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message