hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-14495) TestHRegion#testFlushCacheWhileScanning goes zombie
Date Sat, 26 Sep 2015 05:20:04 GMT

     [ https://issues.apache.org/jira/browse/HBASE-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

stack updated HBASE-14495:
--------------------------
    Attachment: 14495.txt

Patch that 'fixes' the test so we don't zombie:

HRegion changes are minor cleanup; no effective change.

In MVCC, we were doing wait(0) which means forever. Change it so we wake up and log the fact
that we are 'stuck'.

In WALUtil, if error on append, we could skip out with completing the mvcc. This is a hole.

'Fixed' TestHRegion so it doesn't cause abort by removing flush interrpt... just wait on flush
to complete.

> TestHRegion#testFlushCacheWhileScanning goes zombie
> ---------------------------------------------------
>
>                 Key: HBASE-14495
>                 URL: https://issues.apache.org/jira/browse/HBASE-14495
>             Project: HBase
>          Issue Type: Sub-task
>          Components: test
>            Reporter: stack
>            Assignee: stack
>         Attachments: 14495.txt, 14495.txt
>
>
> This test goes zombie on us, most recently, here: https://builds.apache.org/job/PreCommit-HBASE-Build/15744//console
> It does not fail on my internal rig runs nor locally on laptop when run in a loop.
> Its hung up in close of the region:
> {code}
> "main" prio=10 tid=0x00007fc49800a800 nid=0x6053 in Object.wait() [0x00007fc4a02c9000]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <0x00000007d07c3478> (a java.lang.Object)
> 	at org.apache.hadoop.hbase.regionserver.MultiVersionConcurrencyControl.waitForRead(MultiVersionConcurrencyControl.java:207)
> 	- locked <0x00000007d07c3478> (a java.lang.Object)
> 	at org.apache.hadoop.hbase.regionserver.MultiVersionConcurrencyControl.completeAndWait(MultiVersionConcurrencyControl.java:143)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2257)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2061)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2026)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2016)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1423)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344)
> 	- locked <0x00000007d07c34a8> (a java.lang.Object)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1295)
> 	at org.apache.hadoop.hbase.HBaseTestingUtility.closeRegionAndWAL(HBaseTestingUtility.java:352)
> 	at org.apache.hadoop.hbase.regionserver.TestHRegion.testFlushCacheWhileScanning(TestHRegion.java:3756)
> {code}
> It is waiting on mvcc to catch up.
> There is this comment at the point where we are hung:
>             // TODO: Lets see if we hang here, if there is a scenario where an outstanding
reader
>             // with a read point is in advance of this write point.
>             mvcc.completeAndWait(writeEntry);
> The above came in with HBASE-12751. The comment was added at v29:
> https://issues.apache.org/jira/secure/attachment/12754775/12751.rebased.v29.txt
> Looks like I added it so must have had predilection that this might be dodgy... Let me
take a look... 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message