accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Fuchs (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-3272) tserver breaks in a bad way when it can't write to hdfs trash
Date Thu, 30 Oct 2014 15:25:34 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190225#comment-14190225
] 

Adam Fuchs commented on ACCUMULO-3272:
--------------------------------------

Here is a state transition diagram for a root tablet compaction:
{code}
Initial state: files A and B are ready to compact
1. Files: A,B,C   Refs: A,B,C
Compact generates new file
2. Files: A,B,C,D_tmp   Refs: A,B,C
Grab tablet lock
Wait for scans to finish
Prepare replacement, renaming old files to include a "delete" in their names
3. Files: dA,B,C,D_tmp   Refs: A,B,C
4. Files: dA,dB,C,D_tmp  Refs: A,B,C
Rename replacement, dropping the _tmp
5. Files: dA,dB,C,D   Refs: A,B,C
Finish replacement, removing files to be deleted (this is where the trash issue pops up)
6. Files: dB,C,D    Refs: A,B,C
7. Files: C,D       Refs: A,B,C
Clean up references, removing the old files from the tablet's view of current sources
8. Files: C,D     Refs: B,C
9. Files: C,D     Refs: C
Reference new file, adding the newly compacted file to the source reference set
10. Files: C,D    Refs: C,D
Exit critical section, releasing tablet lock
{code}

The problem is that exceptions thrown during any of the transitions from states 3 through
9 leave the reference set inconsistent with the set of files existing in HDFS. The transitions
from states 2 through 6 are HDFS operations that may through IOExceptions, including rename
and delete (or moveToTrash) calls. Instead of leaving the critical state, we should either
take the tablet offline or retry for a while and then take the tablet offline. Sean's suggestion
is a better way of handling the one type of exception, but there are many other ways that
IOExceptions can be thrown in this code. See the attached state diagram for a graphical description
of this.

> tserver breaks in a bad way when it can't write to hdfs trash
> -------------------------------------------------------------
>
>                 Key: ACCUMULO-3272
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3272
>             Project: Accumulo
>          Issue Type: Bug
>    Affects Versions: 1.5.0, 1.6.0
>            Reporter: Adam Fuchs
>         Attachments: ProposedRootCompactionStateTransition.png
>
>
> When installing and testing a vanilla install of HDP 2.1 the HDFS setting for fs.trash.interval
is set to 360 by default. Accumulo takes this to mean that it should move deleted files to
the .Trash directory in accumulo's hdfs home directory. In this instance, the home directory
did not exist, which caused a major compaction to fail. The failure happened in such a way
that the internal state of the tserver became inconsistent, preventing automatic recovery
after an admin solved the trash problem.
> The first stack trace below shows the initial problem. The second shows a secondary problem
caused by the poor failure mode.
> {code}
> 2014-10-28 15:05:19,353 [tserver.Tablet] DEBUG: Major compaction plan: [hdfs://n1.sqrrl-lab.net:8020/accumulo_1.6_perf_test/tables/+r/root_tablet/00000_00000.rf,
hdfs://n1.sqrrl-lab.net:8020/accumulo_1.6_perf_test/tables/+r/root_tablet/F000003q.rf] propogate
deletes : false
> 2014-10-28 15:05:19,353 [tserver.Tablet] DEBUG: MajC initiate lock 0.00 secs, wait 0.00
secs
> 2014-10-28 15:05:19,356 [tserver.Tablet] DEBUG: Starting MajC +r<< (NORMAL) [hdfs://n1.sqrrl-lab.net:8020/accumulo_1.6_perf_test/tables/+r/root_tablet/F000003q.rf,
hdfs://n1.sqrrl-lab.net:8020/accumulo_1.6_perf_test/tables/+r/root_tablet/00000_00000.rf]
--> hdfs://n1.sqrrl-lab.
> net:8020/accumulo_1.6_perf_test/tables/+r/root_tablet/A000003r.rf_tmp  []
> 2014-10-28 15:05:19,376 [tserver.TabletServer] DEBUG: Got getScans message from user:
!SYSTEM
> 2014-10-28 15:05:19,432 [tserver.Compactor] DEBUG: Compaction +r<< 28 read | 18
written |    608 entries/sec |  0.046 secs
> 2014-10-28 15:05:19,482 [fs.TrashPolicyDefault] INFO : Namenode trash configuration:
Deletion interval = 360 minutes, Emptier interval = 0 minutes.
> 2014-10-28 15:05:19,491 [fs.TrashPolicyDefault] WARN : Can't create trash directory:
hdfs://n1.sqrrl-lab.net:8020/user/accumulo/.Trash/Current/accumulo_1.6_perf_test/tables/+r/root_tablet
> 2014-10-28 15:05:19,491 [tserver.Tablet] ERROR: MajC Failed, extent = +r<<
> 2014-10-28 15:05:19,491 [tserver.Tablet] ERROR: MajC Failed, message = Failed to move
to trash: hdfs://n1.sqrrl-lab.net:8020/accumulo_1.6_perf_test/tables/+r/root_tablet/delete+A000003r.rf+F000003q.rf
> java.io.IOException: Failed to move to trash: hdfs://n1.sqrrl-lab.net:8020/accumulo_1.6_perf_test/tables/+r/root_tablet/delete+A000003r.rf+F000003q.rf
>         at org.apache.hadoop.fs.TrashPolicyDefault.moveToTrash(TrashPolicyDefault.java:160)
>         at org.apache.hadoop.fs.Trash.moveToTrash(Trash.java:109)
>         at org.apache.accumulo.server.fs.VolumeManagerImpl.moveToTrash(VolumeManagerImpl.java:364)
>         at org.apache.accumulo.tserver.RootFiles.finishReplacement(RootFiles.java:64)
>         at org.apache.accumulo.tserver.RootFiles.replaceFiles(RootFiles.java:75)
>         at org.apache.accumulo.tserver.Tablet$DatafileManager.bringMajorCompactionOnline(Tablet.java:1001)
>         at org.apache.accumulo.tserver.Tablet._majorCompact(Tablet.java:3239)
>         at org.apache.accumulo.tserver.Tablet.majorCompact(Tablet.java:3340)
>         at org.apache.accumulo.tserver.Tablet.access$4800(Tablet.java:172)
>         at org.apache.accumulo.tserver.Tablet$CompactionRunner.run(Tablet.java:2804)
>         at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:42)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:42)
>         at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
>         at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.hadoop.security.AccessControlException: Permission denied: user=accumulo,
access=WRITE, inode="/user":hdfs:hdfs:drwxr-xr-x
>         at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:265)
>         at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:251)
>         at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:232)
>         at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:176)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5509)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5491)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:5465)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:3608)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:3578)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3552)
>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:760)
>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:558)
>         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>         at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>         at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>         at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2555)
>         at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2524)
>         at org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:827)
>         at org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:823)
>         at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>         at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:823)
>         at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:816)
>         at org.apache.hadoop.fs.TrashPolicyDefault.moveToTrash(TrashPolicyDefault.java:136)
>         ... 15 more
> Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
Permission denied: user=accumulo, access=WRITE, inode="/user":hdfs:hdfs:drwxr-xr-x
>         at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:265)
>         at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:251)
>         at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:232)
>         at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:176)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5509)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5491)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:5465)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:3608)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:3578)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3552)
>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:760)
>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:558)
>         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>         at com.sun.proxy.$Proxy20.mkdirs(Unknown Source)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>         at com.sun.proxy.$Proxy20.mkdirs(Unknown Source)
>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:500)
>         at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2553)
>         ... 22 more
> {code}
> {code}
> 2014-10-28 15:05:20,558 [tserver.FileManager] ERROR: Failed to open file hdfs://n1.sqrrl-lab.net:8020/accumulo_1.6_perf_test/tables/+r/root_tablet/F000003q.rf
File does not exist: /accumulo_1.6_perf_test/tables/+r/root_tablet/F000003q.rf
>         at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65)
>         at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1728)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1671)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1651)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1625)
>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:503)
>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:322)
>         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> 2014-10-28 15:05:20,558 [problems.ProblemReports] DEBUG: Filing problem report +r FILE_READ
hdfs://n1.sqrrl-lab.net:8020/accumulo_1.6_perf_test/tables/+r/root_tablet/F000003q.rf
> 2014-10-28 15:05:20,559 [tserver.TabletServer] WARN : exception while scanning tablet
+r<<
> java.io.IOException: Failed to open hdfs://n1.sqrrl-lab.net:8020/accumulo_1.6_perf_test/tables/+r/root_tablet/F000003q.rf
>         at org.apache.accumulo.tserver.FileManager.reserveReaders(FileManager.java:334)
>         at org.apache.accumulo.tserver.FileManager.access$500(FileManager.java:59)
>         at org.apache.accumulo.tserver.FileManager$ScanFileManager.openFiles(FileManager.java:491)
>         at org.apache.accumulo.tserver.FileManager$ScanFileManager.openFileRefs(FileManager.java:479)
>         at org.apache.accumulo.tserver.FileManager$ScanFileManager.openFiles(FileManager.java:499)
>         at org.apache.accumulo.tserver.Tablet$ScanDataSource.createIterator(Tablet.java:1980)
>         at org.apache.accumulo.tserver.Tablet$ScanDataSource.iterator(Tablet.java:1942)
>         at org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.seek(SourceSwitchingIterator.java:165)
>         at org.apache.accumulo.tserver.Tablet.nextBatch(Tablet.java:1659)
>         at org.apache.accumulo.tserver.Tablet.access$3200(Tablet.java:172)
>         at org.apache.accumulo.tserver.Tablet$Scanner.read(Tablet.java:1799)
>         at org.apache.accumulo.tserver.TabletServer$ThriftClientHandler$NextBatchTask.run(TabletServer.java:1041)
>         at org.apache.accumulo.tserver.TabletServerResourceManager.executeReadAhead(TabletServerResourceManager.java:642)
>         at org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.continueScan(TabletServer.java:1278)
>         at org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.startScan(TabletServer.java:1247)
>         at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.accumulo.trace.instrument.thrift.RpcServerInvocationHandler.invoke(RpcServerInvocationHandler.java:46)
>         at org.apache.accumulo.server.util.RpcWrapper$1.invoke(RpcWrapper.java:44)
>         at com.sun.proxy.$Proxy21.startScan(Unknown Source)
>         at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startScan.getResult(TabletClientService.java:2179)
>         at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startScan.getResult(TabletClientService.java:2163)
>         at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>         at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>         at org.apache.accumulo.server.util.TServerUtils$TimedProcessor.process(TServerUtils.java:168)
>         at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:516)
>         at org.apache.accumulo.server.util.CustomNonBlockingServer$1.run(CustomNonBlockingServer.java:77)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
>         at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
>         at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.FileNotFoundException: File does not exist: /accumulo_1.6_perf_test/tables/+r/root_tablet/F000003q.rf
>         at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65)
>         at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1728)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1671)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1651)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1625)
>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:503)
>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:322)
>         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>         at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>         at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>         at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1144)
>         at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1132)
>         at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1122)
>         at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:264)
>         at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:231)
>         at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:224)
>         at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1295)
>         at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:300)
>         at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:296)
>         at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>         at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:296)
>         at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764)
>         at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBCFile(CachableBlockFile.java:261)
>         at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.access$100(CachableBlockFile.java:144)
>         at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader$MetaBlockLoader.get(CachableBlockFile.java:216)
>         at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBlock(CachableBlockFile.java:318)
>         at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:372)
>         at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:144)
>         at org.apache.accumulo.core.file.rfile.RFile$Reader.<init>(RFile.java:825)
>         at org.apache.accumulo.core.file.rfile.RFileOperations.openReader(RFileOperations.java:79)
>         at org.apache.accumulo.core.file.DispatchingFileFactory.openReader(FileOperations.java:119)
>         at org.apache.accumulo.tserver.FileManager.reserveReaders(FileManager.java:315)
>         ... 32 more
> Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File
does not exist: /accumulo_1.6_perf_test/tables/+r/root_tablet/F000003q.rf
>         at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65)
>         at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1728)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1671)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1651)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1625)
>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:503)
>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:322)
>         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>         at com.sun.proxy.$Proxy20.getBlockLocations(Unknown Source)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>         at com.sun.proxy.$Proxy20.getBlockLocations(Unknown Source)
>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:219)
>         at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1142)
>         ... 53 more
> {code}
> I'm not sure if this could possibly cause inconsistencies that are visible to the end
user, but it seems possible. The goal of this ticket is to improve the failure mode rather
than to fully handle the case where the trash policy isn't supported by the underlying infrastructure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message