hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14327) TestIOFencing#testFencingAroundCompactionAfterWALSync is flaky
Date Fri, 28 Aug 2015 16:11:46 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720087#comment-14720087
] 

stack commented on HBASE-14327:
-------------------------------

The archiver is running in between close and open of the region in the new location? Is the
file removed the result of the compaction or is it one of the inputs on the original region?
The archive thinks it is safe to remove though the compaction has not 'completed' yet? That
sounds like another issue. In this case, if the archiver is moving the file aside, before
the new region open has a chance to work on it, yeah, delay or stop the archiver in the test.

Thank you for digging in on this [~chenheng]

> TestIOFencing#testFencingAroundCompactionAfterWALSync is flaky
> --------------------------------------------------------------
>
>                 Key: HBASE-14327
>                 URL: https://issues.apache.org/jira/browse/HBASE-14327
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>            Reporter: Dima Spivak
>            Priority: Critical
>
> I'm looking into some more of the flaky tests on trunk and this one seems to be  particularly
gross, failing about half the time in recent days. Some probably-relevant output from [a recent
run|https://builds.apache.org/job/HBase-TRUNK/6761/testReport/org.apache.hadoop.hbase/TestIOFencing/testFencingAroundCompactionAfterWALSync/]:
> {noformat}
> 2015-08-27 18:50:14,318 INFO  [main] hbase.TestIOFencing(326): Allowing compaction to
proceed
> 2015-08-27 18:50:14,318 DEBUG [main] hbase.TestIOFencing$CompactionBlockerRegion(110):
allowing compactions
> 2015-08-27 18:50:14,318 DEBUG [RS:0;hemera:35619-shortCompactions-1440701403303] regionserver.HStore(1732):
Removing store files after compaction...
> 2015-08-27 18:50:14,323 DEBUG [RS:0;hemera:35619-longCompactions-1440701391112] regionserver.HStore(1732):
Removing store files after compaction...
> 2015-08-27 18:50:14,330 DEBUG [RS:0;hemera:35619-longCompactions-1440701391112] backup.HFileArchiver(224):
Archiving compacted store files.
> 2015-08-27 18:50:14,331 DEBUG [RS:0;hemera:35619-shortCompactions-1440701403303] backup.HFileArchiver(224):
Archiving compacted store files.
> 2015-08-27 18:50:14,337 DEBUG [RS:0;hemera:35619-longCompactions-1440701391112] backup.HFileArchiver(438):
Finished archiving from class org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile,
file:hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/99e903ad7e0f4029862d0e35c5548464,
to hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/archive/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/99e903ad7e0f4029862d0e35c5548464
> 2015-08-27 18:50:14,337 DEBUG [RS:0;hemera:35619-shortCompactions-1440701403303] backup.HFileArchiver(438):
Finished archiving from class org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile,
file:hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/74a80cc06d134361941085bc2bb905fe,
to hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/archive/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/74a80cc06d134361941085bc2bb905fe
> 2015-08-27 18:50:14,341 DEBUG [RS:0;hemera:35619-longCompactions-1440701391112] backup.HFileArchiver(438):
Finished archiving from class org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile,
file:hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/7067addd325446089ba15ec2c77becbc,
to hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/archive/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/7067addd325446089ba15ec2c77becbc
> 2015-08-27 18:50:14,342 INFO  [RS:0;hemera:35619-longCompactions-1440701391112] regionserver.HStore(1353):
Completed compaction of 2 (all) file(s) in family of tabletest,,1440701396419.94d6f21f7cf387d73d8622f535c67311.
into e138bb0ec6c64ad19efab3b44dbbcb1a(size=68.7 K), total size for store is 146.9 K. This
selection was in queue for 0sec, and took 10sec to execute.
> 2015-08-27 18:50:14,343 INFO  [RS:0;hemera:35619-longCompactions-1440701391112] regionserver.CompactSplitThread$CompactionRunner(527):
Completed compaction: Request = regionName=tabletest,,1440701396419.94d6f21f7cf387d73d8622f535c67311.,
storeName=family, fileCount=2, fileSize=73.1 K, priority=998, time=525052314434020; duration=10sec
> 2015-08-27 18:50:14,343 DEBUG [RS:0;hemera:35619-shortCompactions-1440701403303] backup.HFileArchiver(438):
Finished archiving from class org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile,
file:hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/2926c09f1941416eb557ee5d283d7e2b,
to hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/archive/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/2926c09f1941416eb557ee5d283d7e2b
> 2015-08-27 18:50:14,347 DEBUG [RS_CLOSE_REGION-hemera:35619-0] regionserver.HRegion(1405):
Updates disabled for region tabletest,,1440701396419.94d6f21f7cf387d73d8622f535c67311.
> 2015-08-27 18:50:14,343 DEBUG [RS:0;hemera:35619-longCompactions-1440701391112] regionserver.CompactSplitThread$CompactionRunner(550):
CompactSplitThread Status: compaction_queue=(0:3), split_queue=0, merge_queue=0
> 2015-08-27 18:50:14,348 ERROR [RS:0;hemera:35619-shortCompactions-1440701403303] regionserver.CompactSplitThread$CompactionRunner(547):
Compaction failed Request = regionName=tabletest,,1440701396419.94d6f21f7cf387d73d8622f535c67311.,
storeName=family, fileCount=2, fileSize=51.6 K, priority=998, time=525051341420855
> java.lang.NullPointerException
> 	at org.apache.hadoop.hbase.regionserver.HStore.logCompactionEndMessage(HStore.java:1343)
> 	at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1259)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1812)
> 	at org.apache.hadoop.hbase.TestIOFencing$CompactionBlockerRegion.compact(TestIOFencing.java:127)
> 	at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:744)
> 2015-08-27 18:50:14,349 DEBUG [RS:0;hemera:35619-shortCompactions-1440701403303] regionserver.CompactSplitThread$CompactionRunner(550):
CompactSplitThread Status: compaction_queue=(0:3), split_queue=0, merge_queue=0
> 2015-08-27 18:50:14,351 INFO  [StoreCloserThread-tabletest,,1440701396419.94d6f21f7cf387d73d8622f535c67311.-1]
regionserver.HStore(889): Closed family
> 2015-08-27 18:50:14,352 ERROR [RS_CLOSE_REGION-hemera:35619-0] regionserver.HRegion(1493):
Memstore size is 62664
> 2015-08-27 18:50:14,353 INFO  [RS_CLOSE_REGION-hemera:35619-0] regionserver.HRegion(1506):
Closed tabletest,,1440701396419.94d6f21f7cf387d73d8622f535c67311.
> 2015-08-27 18:50:14,353 DEBUG [RS_CLOSE_REGION-hemera:35619-0] handler.CloseRegionHandler(122):
Closed tabletest,,1440701396419.94d6f21f7cf387d73d8622f535c67311.
> 2015-08-27 18:50:14,361 ERROR [RS:1;hemera:39431-shortCompactions-1440701414271] regionserver.CompactSplitThread$CompactionRunner(541):
Compaction failed Request = regionName=tabletest,,1440701396419.94d6f21f7cf387d73d8622f535c67311.,
storeName=family, fileCount=9, fileSize=289.6 K (25.6 K, 47.0 K, 26.0 K, 68.7 K, 26.1 K, 26.1
K, 26.1 K, 26.1 K, 18.0 K), priority=991, time=525062316035397
> java.io.FileNotFoundException: File does not exist: /user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/7067addd325446089ba15ec2c77becbc
> 	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:64)
> 	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:54)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1795)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1738)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1718)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1690)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:519)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:337)
> 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> 	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
> 	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
> 	at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1167)
> 	at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1155)
> 	at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1145)
> 	at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:268)
> 	at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:235)
> 	at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:228)
> 	at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1318)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:293)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:289)
> 	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:289)
> 	at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:160)
> 	at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764)
> 	at org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.<init>(FSDataInputStreamWrapper.java:106)
> 	at org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.<init>(FSDataInputStreamWrapper.java:82)
> 	at org.apache.hadoop.hbase.regionserver.StoreFileInfo.open(StoreFileInfo.java:248)
> 	at org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:385)
> 	at org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:492)
> 	at org.apache.hadoop.hbase.regionserver.StoreFileScanner.getScannersForStoreFiles(StoreFileScanner.java:115)
> 	at org.apache.hadoop.hbase.regionserver.StoreFileScanner.getScannersForStoreFiles(StoreFileScanner.java:99)
> 	at org.apache.hadoop.hbase.regionserver.compactions.Compactor.createFileScanners(Compactor.java:203)
> 	at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:70)
> 	at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:124)
> 	at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1231)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1812)
> 	at org.apache.hadoop.hbase.TestIOFencing$CompactionBlockerRegion.compact(TestIOFencing.java:127)
> 	at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File
does not exist: /user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/7067addd325446089ba15ec2c77becbc
> 	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:64)
> 	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:54)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1795)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1738)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1718)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1690)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:519)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:337)
> 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1411)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1364)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> 	at com.sun.proxy.$Proxy21.getBlockLocations(Unknown Source)
> 	at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> 	at com.sun.proxy.$Proxy21.getBlockLocations(Unknown Source)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:225)
> 	at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:279)
> 	at com.sun.proxy.$Proxy22.getBlockLocations(Unknown Source)
> 	at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1165)
> 	... 29 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message