hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
Date Wed, 28 Mar 2018 02:52:12 GMT

    [ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16416679#comment-16416679
] 

Hudson commented on HBASE-20090:
--------------------------------

Results for branch HBASE-19064
	[build #77 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-19064/77/]:
(x) *{color:red}-1 overall{color}*
----
details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-19064/77//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-19064/77//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-19064/77//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
> -------------------------------------------------------------------------------
>
>                 Key: HBASE-20090
>                 URL: https://issues.apache.org/jira/browse/HBASE-20090
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>            Priority: Major
>             Fix For: 2.0.0
>
>         Attachments: 20090-server-61260-01-000007.log, 20090.v10.txt, 20090.v10.txt,
20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt
>
>
> Copied the following from a comment since this was better description of the race condition.
> The original description was merged to the beginning of my first comment below.
> Observed the following in region server log (running on hadoop3 cluster):
> {code}
> 2018-02-26 16:06:50,044 WARN  [RpcServer.default.FPBQ.Fifo.handler=26,queue=2,port=16020]
regionserver.MemStoreFlusher: Memstore is above high water mark and block 135ms
> 2018-02-26 16:06:50,049 ERROR [MemStoreFlusher.1] regionserver.MemStoreFlusher: Cache
flusher failed for entry org.apache.hadoop.hbase.regionserver.                           
  MemStoreFlusher$WakeupFlushThread@2adfadd7
> java.lang.IllegalStateException
>         at org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:441)
>         at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:174)
>         at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$600(MemStoreFlusher.java:69)
>         at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237)
>         at java.lang.Thread.run(Thread.java:748)
> {code}
> Here is the Precondition from MemStoreFlusher#flushOneForGlobalPressure() :
> {code}
>       Preconditions.checkState(
>         (regionToFlush != null && regionToFlushSize > 0) || bestRegionReplicaSize
> 0);
> {code}
> When the Preconditions check fails, IllegalStateException would be raised.
> With more debug logging, we can see the scenario where the exception was triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: Splitting
TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., compaction_queue=(0:0),     split_queue=1
> 2018-03-02 17:28:30,098 DEBUG [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020]
regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info      size=6.9G,
sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020]
regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] regionserver.MemStoreFlusher: Flush
thread woke up because memory above low water=381.5 M
> 2018-03-02 17:28:30,297 INFO  [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020]
regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] regionserver.MemStoreFlusher: region
TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] regionserver.MemStoreFlusher: region
atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] regionserver.MemStoreFlusher: Flush
of region atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global     heap
pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap size=381.9 MTotal Memstore
Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] regionserver.MemStoreFlusher: wake
up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] regionserver.MemStoreFlusher: Nothing
to flush for atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] regionserver.MemStoreFlusher: Excluding
unflushable region atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -    trying
to find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 20090-server-61260-01-000007.log
:
> {code}
>       synchronized (writestate) {
>         if (!writestate.flushing && writestate.writesEnabled) {
>           this.writestate.flushing = true;
>         } else {
>           if (LOG.isDebugEnabled()) {
>             LOG.debug("NOT flushing memstore for region " + this
>                 + ", flushing=" + writestate.flushing + ", writesEnabled="
>                 + writestate.writesEnabled);
>           }
> {code}
> Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving memory pressure
at high level.
> When MemStoreFlusher ran to the following call, the region was no longer a flush candidate:
> {code}
>       HRegion bestFlushableRegion =
>           getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
> {code}
> So the other region, atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was
examined next. Since the region was not receiving write, the (current) Precondition check
failed.
> The proposed fix is to convert the Precondition to normal return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message