hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ramkrishna.s.vasudevan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs
Date Fri, 09 Jun 2017 08:23:21 GMT

    [ https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16044134#comment-16044134
] 

ramkrishna.s.vasudevan commented on HBASE-18186:
------------------------------------------------

This and HBASE-17406 I can see another way this could happen - mainly the StackTrace of file
not found during the time of opening the scanners,
->Thread 1 creates a scan and it acquires the HStore's readLock and tries to get all the
storeFiles over which the StorefileScanner has to be created. So it has just come out of the
read lock and still the scanners are not created.
->Thread 2 - a compaction is completed where the store files involved in the scanner of
Thread 1 are actually compacted. It acquires the write lock to update the compactedFiles.
These compactedFiles are marked compactedAway.
->Thread 3 (The CompactionDischareHandler) starts and it acquires the read lock to find
the list of compactedfiles. It gets all the files created by Thread 2 and continues with archiving
those files. At this point the refCount is still 0 for those storefiles. So it can see these
files are marked compactedAway but refCount is still 0 and so it can remove those files for
archiving.

->Now Thread 1 continues with incrementing the refcount to 1 on the Storefiles used for
scanning. But by this time the file could have been archived. 
Am not sure if a test case will be possible here. But let me check by adding some debug points
if the above can be reproduced. How ever the soln is simple is like we could extend the readLock
in Thread 1 to be held till the StorefileScanners are created because the only costly step
there is creation of readers and that is already done when the storeFiles are actually compacted/flushed.


> Frequent FileNotFoundExceptions in region server logs
> -----------------------------------------------------
>
>                 Key: HBASE-18186
>                 URL: https://issues.apache.org/jira/browse/HBASE-18186
>             Project: HBase
>          Issue Type: Bug
>          Components: Compaction, Scanners
>    Affects Versions: 1.3.1
>            Reporter: Ashu Pachauri
>
> We see frequent FileNotFoundException in regionserver logs on multiple code paths trying
to reference non existing store files. I know that there have been multiple bugs in store
file accounting of compacted store files. Examples include: HBASE-16964 , HBASE-16754 and
HBASE-16788.
> Observations:  
> 1. The issue mentioned here also seems to bear a similar flavor, because we are not seeing
rampant dataloss given the frequency of these exceptions in the logs. So, it's more likely
an accounting issue, but I could be wrong. 
> 2. The frequency with which this happens on scan heavy workload is at least one order
of magnitude higher than a mixed workload.
> Stack traces:
> {Code}
> WARN backup.HFileArchiver: Failed to archive class org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile,
file:hdfs://<region_dir>/<cf>/<store_file> because it does not exist! Skipping
and continuing on.
> java.io.FileNotFoundException: File/Directory <region_dir>/<cf>/<store_file>
does not exist.
> 	at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915)
> 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
> 	at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown Source)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> 	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
> 	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
> 	at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516)
> 	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530)
> 	at org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496)
> 	at org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805)
> 	at org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575)
> 	at org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410)
> 	at org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320)
> 	at org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:242)
> 	at org.apache.hadoop.hbase.regionserver.HRegionFileSystem.removeStoreFiles(HRegionFileSystem.java:433)
> 	at org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(HStore.java:2723)
> 	at org.apache.hadoop.hbase.regionserver.HStore.closeAndArchiveCompactedFiles(HStore.java:2672)
> 	at org.apache.hadoop.hbase.regionserver.CompactedHFilesDischargeHandler.process(CompactedHFilesDischargeHandler.java:43)
> 	at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File/Directory
<region_dir>/<cf>/<store_file> does not exist.
> 	at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915)
> 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1476)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1413)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> 	at com.sun.proxy.$Proxy16.setTimes(Unknown Source)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.setTimes(ClientNamenodeProtocolTranslatorPB.java:854)
> 	at sun.reflect.GeneratedMethodAccessor138.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> 	at com.sun.proxy.$Proxy17.setTimes(Unknown Source)
> 	at sun.reflect.GeneratedMethodAccessor138.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:302)
> 	at com.sun.proxy.$Proxy18.setTimes(Unknown Source)
> 	at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3113)
> 	... 18 more
> {Code}
> {Code}
> ERROR regionserver.CompactSplitThread: Compaction selection failed Store = <cf>,
pri = 289
> java.io.FileNotFoundException: File does not exist: hdfs://<region_dir>/<cf>/<store_file>
> 	at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
> 	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
> 	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
> 	at org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> 	at org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> 	at org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> 	at org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:321)
> 	at org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> 	at org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:64)
> 	at org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
> 	at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
> 	at org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1661)
> 	at org.apache.hadoop.hbase.regionserver.CompactSplitThread.selectCompaction(CompactSplitThread.java:369)
> 	at org.apache.hadoop.hbase.regionserver.CompactSplitThread.access$100(CompactSplitThread.java:59)
> 	at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.doCompaction(CompactSplitThread.java:494)
> 	at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:564)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> {Code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message