hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ramkrishna.s.vasudevan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs
Date Fri, 09 Jun 2017 05:07:18 GMT

    [ https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043940#comment-16043940
] 

ramkrishna.s.vasudevan commented on HBASE-18186:
------------------------------------------------

I verified this test case TestCompactionArchiveIOException. It seems doing the right thing
but if we specifically get fileNotFoundException - I think we can be sure that it was already
been archived and remove that also from the compactedFile list.
Infact to confirm this shall we do file.exists() check in the 'archive' path?

> Frequent FileNotFoundExceptions in region server logs
> -----------------------------------------------------
>
>                 Key: HBASE-18186
>                 URL: https://issues.apache.org/jira/browse/HBASE-18186
>             Project: HBase
>          Issue Type: Bug
>          Components: Compaction, Scanners
>    Affects Versions: 1.3.1
>            Reporter: Ashu Pachauri
>
> We see frequent FileNotFoundException in regionserver logs on multiple code paths trying
to reference non existing store files. I know that there have been multiple bugs in store
file accounting of compacted store files. Examples include: HBASE-16964 , HBASE-16754 and
HBASE-16788.
> Observations:  
> 1. The issue mentioned here also seems to bear a similar flavor, because we are not seeing
rampant dataloss given the frequency of these exceptions in the logs. So, it's more likely
an accounting issue, but I could be wrong. 
> 2. The frequency with which this happens on scan heavy workload is at least one order
of magnitude higher than a mixed workload.
> Stack traces:
> {Code}
> WARN backup.HFileArchiver: Failed to archive class org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile,
file:hdfs://<region_dir>/<cf>/<store_file> because it does not exist! Skipping
and continuing on.
> java.io.FileNotFoundException: File/Directory <region_dir>/<cf>/<store_file>
does not exist.
> 	at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915)
> 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
> 	at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown Source)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> 	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
> 	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
> 	at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516)
> 	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530)
> 	at org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496)
> 	at org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805)
> 	at org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575)
> 	at org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410)
> 	at org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320)
> 	at org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:242)
> 	at org.apache.hadoop.hbase.regionserver.HRegionFileSystem.removeStoreFiles(HRegionFileSystem.java:433)
> 	at org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(HStore.java:2723)
> 	at org.apache.hadoop.hbase.regionserver.HStore.closeAndArchiveCompactedFiles(HStore.java:2672)
> 	at org.apache.hadoop.hbase.regionserver.CompactedHFilesDischargeHandler.process(CompactedHFilesDischargeHandler.java:43)
> 	at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File/Directory
<region_dir>/<cf>/<store_file> does not exist.
> 	at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915)
> 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1476)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1413)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> 	at com.sun.proxy.$Proxy16.setTimes(Unknown Source)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.setTimes(ClientNamenodeProtocolTranslatorPB.java:854)
> 	at sun.reflect.GeneratedMethodAccessor138.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> 	at com.sun.proxy.$Proxy17.setTimes(Unknown Source)
> 	at sun.reflect.GeneratedMethodAccessor138.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:302)
> 	at com.sun.proxy.$Proxy18.setTimes(Unknown Source)
> 	at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3113)
> 	... 18 more
> {Code}
> {Code}
> ERROR regionserver.CompactSplitThread: Compaction selection failed Store = <cf>,
pri = 289
> java.io.FileNotFoundException: File does not exist: hdfs://<region_dir>/<cf>/<store_file>
> 	at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
> 	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
> 	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
> 	at org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> 	at org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> 	at org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> 	at org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:321)
> 	at org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> 	at org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:64)
> 	at org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
> 	at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
> 	at org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1661)
> 	at org.apache.hadoop.hbase.regionserver.CompactSplitThread.selectCompaction(CompactSplitThread.java:369)
> 	at org.apache.hadoop.hbase.regionserver.CompactSplitThread.access$100(CompactSplitThread.java:59)
> 	at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.doCompaction(CompactSplitThread.java:494)
> 	at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:564)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> {Code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message