hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-14495) TestHRegion#testFlushCacheWhileScanning goes zombie
Date Tue, 29 Sep 2015 05:29:04 GMT

     [ https://issues.apache.org/jira/browse/HBASE-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

stack updated HBASE-14495:
--------------------------
    Release Note: The WAL append was changed by HBASE-12751. Every append now sets a latch
on an edit. The latch needs to be cleared or else the WAL will hang. The original failures
in TestHRegion turned up 'holes' where we were failing to throw the latch if we skipped out
early because we were interrupted. Other 'holes' were found where we had mocked up a WAL so
the latch would just stay in place.  Futher holes were found appending WAL markers... here
we were skipping the mvcc completely for a few edits.  A clean up of WALUtils made all markers
take the same code paths.

> TestHRegion#testFlushCacheWhileScanning goes zombie
> ---------------------------------------------------
>
>                 Key: HBASE-14495
>                 URL: https://issues.apache.org/jira/browse/HBASE-14495
>             Project: HBase
>          Issue Type: Sub-task
>          Components: test
>            Reporter: stack
>            Assignee: stack
>         Attachments: 14495.txt, 14495.txt, 14495v3.txt, 14495v6.txt, 14495v7.txt, 14495v9.txt
>
>
> This test goes zombie on us, most recently, here: https://builds.apache.org/job/PreCommit-HBASE-Build/15744//console
> It does not fail on my internal rig runs nor locally on laptop when run in a loop.
> Its hung up in close of the region:
> {code}
> "main" prio=10 tid=0x00007fc49800a800 nid=0x6053 in Object.wait() [0x00007fc4a02c9000]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <0x00000007d07c3478> (a java.lang.Object)
> 	at org.apache.hadoop.hbase.regionserver.MultiVersionConcurrencyControl.waitForRead(MultiVersionConcurrencyControl.java:207)
> 	- locked <0x00000007d07c3478> (a java.lang.Object)
> 	at org.apache.hadoop.hbase.regionserver.MultiVersionConcurrencyControl.completeAndWait(MultiVersionConcurrencyControl.java:143)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2257)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2061)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2026)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2016)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1423)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344)
> 	- locked <0x00000007d07c34a8> (a java.lang.Object)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1295)
> 	at org.apache.hadoop.hbase.HBaseTestingUtility.closeRegionAndWAL(HBaseTestingUtility.java:352)
> 	at org.apache.hadoop.hbase.regionserver.TestHRegion.testFlushCacheWhileScanning(TestHRegion.java:3756)
> {code}
> It is waiting on mvcc to catch up.
> There is this comment at the point where we are hung:
>             // TODO: Lets see if we hang here, if there is a scenario where an outstanding
reader
>             // with a read point is in advance of this write point.
>             mvcc.completeAndWait(writeEntry);
> The above came in with HBASE-12751. The comment was added at v29:
> https://issues.apache.org/jira/secure/attachment/12754775/12751.rebased.v29.txt
> Looks like I added it so must have had predilection that this might be dodgy... Let me
take a look... 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message