accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "William Slacum (JIRA)" <>
Subject [jira] [Created] (ACCUMULO-3727) FileNotFoundException on failed/data during recovery
Date Tue, 14 Apr 2015 19:17:58 GMT
William Slacum created ACCUMULO-3727:

             Summary: FileNotFoundException on failed/data during recovery
                 Key: ACCUMULO-3727
             Project: Accumulo
          Issue Type: Bug
          Components: tserver
    Affects Versions: 1.5.2
            Reporter: William Slacum

Over night there was a mass failure of Accumulo (most likely due to too many mappers for a
job). After restarting Accumulo, one of the metadata tablets failed to load. There was a log
message showing a `FileNotFoundException` on the file `hdfs:///accumulo/recovery/<log id>/failed/data`.
Removing the `<log id>` directory from HDFS seemed to unclog the jam and things came
back (though potentially with data loss).

I wanted to investigate why somewhere in the plumbing of `TabletServer`, `TabletServerLogger`,
and `SortedLogRecovery`, an attempt was made to use the `failure` file.

I see in `SortedLogRecovery#sort` where the marker file gets created:

public void sort(String name, Path srcPath, String destPath) {
      } catch (Throwable t) {
        try {
          // parent dir may not exist
          fs.mkdirs(new Path(destPath));
          fs.create(new Path(destPath, "failed")).close();
        } catch (IOException e) {
          log.error("Error creating failed flag file " + name, e);
        log.error(t, t);
      } finally {

I have not stepped out to figure out where/why the `failed` files gets included in the list
of recovered data dir.

This message was sent by Atlassian JIRA

View raw message