hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ramkrishna.s.vasudevan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14495) TestHRegion#testFlushCacheWhileScanning goes zombie
Date Fri, 30 Oct 2015 05:35:27 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14981948#comment-14981948
] 

ramkrishna.s.vasudevan commented on HBASE-14495:
------------------------------------------------

{code}
015-10-30 10:38:12,446 DEBUG [FlushThread] regionserver.HRegionFileSystem(382): Committing
store file C:/commit_versions/hbase/hbase-server/target/test-data/7db8e7c9-166b-45eb-b589-027946092beb/data/default/testFlushCacheWhileScanning/9e577f13d0d4caa7d069cc53cd18ae64/.tmp/7a13a0ecbc5140b0a37ea6e76055f994
as C:/commit_versions/hbase/hbase-server/target/test-data/7db8e7c9-166b-45eb-b589-027946092beb/data/default/testFlushCacheWhileScanning/9e577f13d0d4caa7d069cc53cd18ae64/family/7a13a0ecbc5140b0a37ea6e76055f994
2015-10-30 10:38:12,451 INFO  [FlushThread] regionserver.HStore(987): Added C:/commit_versions/hbase/hbase-server/target/test-data/7db8e7c9-166b-45eb-b589-027946092beb/data/default/testFlushCacheWhileScanning/9e577f13d0d4caa7d069cc53cd18ae64/family/7a13a0ecbc5140b0a37ea6e76055f994,
entries=24, sequenceid=953, filesize=5.9 K
2015-10-30 10:38:12,452 INFO  [FlushThread] regionserver.HRegion(2443): Finished memstore
flush of ~4.13 KB/4224, currentsize=2.58 KB/2640 for region testFlushCacheWhileScanning,,1446181690193.9e577f13d0d4caa7d069cc53cd18ae64.
in 42ms, sequenceid=953, compaction requested=true
2015-10-30 10:38:12,454 INFO  [FlushThread] regionserver.HRegion(2146): Flushing 1/1 column
families, memstore=2.58 KB
2015-10-30 10:38:12,454 INFO  [main] regionserver.HRegion(7836): writing data to region testFlushCacheWhileScanning,,1446181690193.9e577f13d0d4caa7d069cc53cd18ae64.
with WAL disabled. Data may be lost in the event of a crash.
2015-10-30 10:38:12,458 DEBUG [FlushThread] hfile.HFile$WriterFactory(309): Unable to set
drop behind on C:/commit_versions/hbase/hbase-server/target/test-data/7db8e7c9-166b-45eb-b589-027946092beb/data/default/testFlushCacheWhileScanning/9e577f13d0d4caa7d069cc53cd18ae64/.tmp/fb0fa5a1b14d4efa9efbf392268b3be1
2015-10-30 10:38:12,461 INFO  [FlushThread] regionserver.DefaultStoreFlusher(90): Flushed,
sequenceid=972, memsize=2.6 K, hasBloomFilter=true, into tmp file C:/commit_versions/hbase/hbase-server/target/test-data/7db8e7c9-166b-45eb-b589-027946092beb/data/default/testFlushCacheWhileScanning/9e577f13d0d4caa7d069cc53cd18ae64/.tmp/fb0fa5a1b14d4efa9efbf392268b3be1
2015-10-30 10:38:12,491 DEBUG [FlushThread] regionserver.HRegionFileSystem(382): Committing
store file C:/commit_versions/hbase/hbase-server/target/test-data/7db8e7c9-166b-45eb-b589-027946092beb/data/default/testFlushCacheWhileScanning/9e577f13d0d4caa7d069cc53cd18ae64/.tmp/fb0fa5a1b14d4efa9efbf392268b3be1
as C:/commit_versions/hbase/hbase-server/target/test-data/7db8e7c9-166b-45eb-b589-027946092beb/data/default/testFlushCacheWhileScanning/9e577f13d0d4caa7d069cc53cd18ae64/family/fb0fa5a1b14d4efa9efbf392268b3be1
2015-10-30 10:38:12,569 ERROR [FlushThread] regionserver.HStore$StoreFlusherImpl(2147): Failed
to commit store file C:/commit_versions/hbase/hbase-server/target/test-data/7db8e7c9-166b-45eb-b589-027946092beb/data/default/testFlushCacheWhileScanning/9e577f13d0d4caa7d069cc53cd18ae64/.tmp/fb0fa5a1b14d4efa9efbf392268b3be1
org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile Trailer from
file file:/C:/commit_versions/hbase/hbase-server/target/test-data/7db8e7c9-166b-45eb-b589-027946092beb/data/default/testFlushCacheWhileScanning/9e577f13d0d4caa7d069cc53cd18ae64/family/fb0fa5a1b14d4efa9efbf392268b3be1
	at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:517)
	at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:546)
	at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.<init>(StoreFile.java:1140)
	at org.apache.hadoop.hbase.regionserver.StoreFileInfo.open(StoreFileInfo.java:265)
	at org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:404)
	at org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:515)
	at org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:505)
	at org.apache.hadoop.hbase.regionserver.HStore.createStoreFileAndReader(HStore.java:672)
	at org.apache.hadoop.hbase.regionserver.HStore.createStoreFileAndReader(HStore.java:664)
	at org.apache.hadoop.hbase.regionserver.HStore.commitFile(HStore.java:980)
	at org.apache.hadoop.hbase.regionserver.HStore.access$200(HStore.java:119)
	at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.commit(HStore.java:2145)
	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2352)
	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2078)
	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2040)
	at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1931)
	at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1857)
	at org.apache.hadoop.hbase.regionserver.TestHRegion$FlushThread.run(TestHRegion.java:3797)
Caused by: java.nio.channels.ClosedByInterruptException
	at java.nio.channels.spi.AbstractInterruptibleChannel.end(Unknown Source)
	at sun.nio.ch.FileChannelImpl.position(Unknown Source)
	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.seek(RawLocalFileSystem.java:115)
	at org.apache.hadoop.fs.BufferedFSInputStream.seek(BufferedFSInputStream.java:96)
	at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:62)
	at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:62)
	at org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream.seek(ChecksumFileSystem.java:325)
	at org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:392)
	at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:501)
	... 17 more
2015-10-30 10:38:12,573 DEBUG [FlushThread] wal.WALKey(102): mvcc=MultiVersionConcurrencyControl{readPoint=1052,
writePoint=1052}, writeEntry=null
2015-10-30 10:38:12,573 WARN  [FlushThread] regionserver.HRegion(2390): 9e577f13d0d4caa7d069cc53cd18ae64
: failed writing ABORT_FLUSH marker to WAL
java.io.InterruptedIOException
	at org.apache.hadoop.hbase.wal.WALKey.getWriteEntry(WALKey.java:108)
	at org.apache.hadoop.hbase.regionserver.wal.WALUtil.writeMarker(WALUtil.java:131)
	at org.apache.hadoop.hbase.regionserver.wal.WALUtil.writeFlushMarker(WALUtil.java:75)
	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2387)
	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2078)
	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2040)
	at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1931)
	at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1857)
	at org.apache.hadoop.hbase.regionserver.TestHRegion$FlushThread.run(TestHRegion.java:3797)
Caused by: java.lang.InterruptedException
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(Unknown
Source)
	at java.util.concurrent.CountDownLatch.await(Unknown Source)
	at org.apache.hadoop.hbase.wal.WALKey.getWriteEntry(WALKey.java:98)
	... 8 more
Already started clsoing
2015-10-30 10:38:12,576 DEBUG [main] regionserver.HRegion(1384): Closing testFlushCacheWhileScanning,,1446181690193.9e577f13d0d4caa7d069cc53cd18ae64.:
disabling compactions & flushes
2015-10-30 10:38:12,578 DEBUG [main] regionserver.HRegion(1411): Updates disabled for region
testFlushCacheWhileScanning,,1446181690193.9e577f13d0d4caa7d069cc53cd18ae64.
2015-10-30 10:38:12,578 INFO  [main] regionserver.HRegion(2146): Flushing 1/1 column families,
memstore=16.16 KB
2015-10-30 10:38:12,579 WARN  [main] regionserver.DefaultMemStore(152): Snapshot called again
without clearing previous. Doing nothing. Another ongoing flush or did we fail last attempt?
2015-10-30 10:38:13,609 WARN  [main] regionserver.MultiVersionConcurrencyControl(214): STUCK:
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:14,633 WARN  [main] regionserver.MultiVersionConcurrencyControl(214): STUCK:
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:15,667 WARN  [main] regionserver.MultiVersionConcurrencyControl(214): STUCK:
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:16,700 WARN  [main] regionserver.MultiVersionConcurrencyControl(214): STUCK:
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:17,730 WARN  [main] regionserver.MultiVersionConcurrencyControl(214): STUCK:
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:18,763 WARN  [main] regionserver.MultiVersionConcurrencyControl(214): STUCK:
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:19,788 WARN  [main] regionserver.MultiVersionConcurrencyControl(214): STUCK:
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:20,817 WARN  [main] regionserver.MultiVersionConcurrencyControl(214): STUCK:
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:21,844 WARN  [main] regionserver.MultiVersionConcurrencyControl(214): STUCK:
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:22,874 WARN  [main] regionserver.MultiVersionConcurrencyControl(214): STUCK:
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:23,905 WARN  [main] regionserver.MultiVersionConcurrencyControl(214): STUCK:
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:24,929 WARN  [main] regionserver.MultiVersionConcurrencyControl(214): STUCK:
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:25,959 WARN  [main] regionserver.MultiVersionConcurrencyControl(214): STUCK:
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:26,987 WARN  [main] regionserver.MultiVersionConcurrencyControl(214): STUCK:
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:28,019 WARN  [main] regionserver.MultiVersionConcurrencyControl(214): STUCK:
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
{code}

I don't have a dump but the logs that shows the issue. 

> TestHRegion#testFlushCacheWhileScanning goes zombie
> ---------------------------------------------------
>
>                 Key: HBASE-14495
>                 URL: https://issues.apache.org/jira/browse/HBASE-14495
>             Project: HBase
>          Issue Type: Sub-task
>          Components: test
>            Reporter: stack
>            Assignee: stack
>             Fix For: 2.0.0
>
>         Attachments: 14495.txt, 14495.txt, 14495v3.txt, 14495v6.txt, 14495v7.txt, 14495v9.txt
>
>
> This test goes zombie on us, most recently, here: https://builds.apache.org/job/PreCommit-HBASE-Build/15744//console
> It does not fail on my internal rig runs nor locally on laptop when run in a loop.
> Its hung up in close of the region:
> {code}
> "main" prio=10 tid=0x00007fc49800a800 nid=0x6053 in Object.wait() [0x00007fc4a02c9000]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <0x00000007d07c3478> (a java.lang.Object)
> 	at org.apache.hadoop.hbase.regionserver.MultiVersionConcurrencyControl.waitForRead(MultiVersionConcurrencyControl.java:207)
> 	- locked <0x00000007d07c3478> (a java.lang.Object)
> 	at org.apache.hadoop.hbase.regionserver.MultiVersionConcurrencyControl.completeAndWait(MultiVersionConcurrencyControl.java:143)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2257)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2061)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2026)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2016)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1423)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344)
> 	- locked <0x00000007d07c34a8> (a java.lang.Object)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1295)
> 	at org.apache.hadoop.hbase.HBaseTestingUtility.closeRegionAndWAL(HBaseTestingUtility.java:352)
> 	at org.apache.hadoop.hbase.regionserver.TestHRegion.testFlushCacheWhileScanning(TestHRegion.java:3756)
> {code}
> It is waiting on mvcc to catch up.
> There is this comment at the point where we are hung:
>             // TODO: Lets see if we hang here, if there is a scenario where an outstanding
reader
>             // with a read point is in advance of this write point.
>             mvcc.completeAndWait(writeEntry);
> The above came in with HBASE-12751. The comment was added at v29:
> https://issues.apache.org/jira/secure/attachment/12754775/12751.rebased.v29.txt
> Looks like I added it so must have had predilection that this might be dodgy... Let me
take a look... 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message