accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luke Brassard (JIRA)" <>
Subject [jira] [Updated] (ACCUMULO-2333) "File does not exist" error during client ingest with agitation
Date Fri, 07 Feb 2014 23:35:19 GMT


Luke Brassard updated ACCUMULO-2333:

    Attachment: tserver_slave05.log

On a cluster with 15 slaves, two of the participating tablet servers had logs referencing
the file.

slave05 was one that was killed by the agitator at 20:33 and then restarted at 20:43, where
it immediately compacted {{F0000dwj.rf}}. That file had been created by slave03 at 20:34 when
slave05 was offline. slave03, who seems to have previously been responsible for the file,
then tried to perform a MajC at 21:10, which caused the exceptions to appear in the monitor.
It seems that the master was also killed at 21:02 and was revived at 21:05. It appears that
the "missing" extent was never unloaded and re-assigned before the failure. 

There were RuntimeExceptions reported by slave03 at about 20:34 as well, so there's a chance
that slave03's actions at that time did not complete cleanly.

I'm attaching logs for the time and pertinent servers.

> "File does not exist" error during client ingest with agitation
> ---------------------------------------------------------------
>                 Key: ACCUMULO-2333
>                 URL:
>             Project: Accumulo
>          Issue Type: Bug
>    Affects Versions: 1.5.0
>            Reporter: Luke Brassard
>         Attachments: master.log, tserver_slave03.log, tserver_slave05.log
> While running the agitator during a client ingest test, encountered a "File does not
exist" error that stuck in the Table Problems section of the monitor page. 
> Confirmed that the file in question had been compacted away previously.
> While it appears that no data was lost, it is strange that the error surfaced and then
seemed to right itself shortly thereafter. (though not updating the Table Problems section)
> Here is the stacktrace from the Monitor:
> {code}
> File does not exist: /accumulo/tables/2/t-00000dj/F0000dwj.rf at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf( at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$
at org.apache.hadoop.ipc.RPC$ at org.apache.hadoop.ipc.Server$Handler$
at org.apache.hadoop.ipc.Server$Handler$ at
Method) at at
at org.apache.hadoop.ipc.Server$
> {code}

This message was sent by Atlassian JIRA

View raw message