accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Newton (JIRA)" <>
Subject [jira] [Assigned] (ACCUMULO-942) accumulo should be more resilient in the face of NN failures
Date Tue, 08 Jan 2013 04:28:12 GMT


Eric Newton reassigned ACCUMULO-942:

    Assignee: Eric Newton  (was: Keith Turner)
> accumulo should be more resilient in the face of NN failures
> ------------------------------------------------------------
>                 Key: ACCUMULO-942
>                 URL:
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>            Priority: Critical
> We experienced a NN failure on a large cluster.  The edit log was written to a RAIDed
file system, but it did lose data sent to the edit log.  We suspect drivers making promises
it did not keep.
> This left Accumulo in a slightly corrupt state: a few references to files that were missing.
> Also, we have attempted to have backup images of HDFS archived for disaster recovery.
 This has not been helpful because Accumulo needs a highly consistent set of metadata, and
a slightly older version of the file system confuses it.
> One defense is to use snapshots.  However, this works at the table level, and it is hard
to coordinate with the HDFS snapshot.
> Another approach is to leave a short history of the files in the !METADATA table.  The
Google paper hints at keeping historical information:
> {quote}
> We also store secondary information in the
> METADATA table, including a log of all events per-
> taining to each tablet (such as when a server begins
> serving it). This information is helpful for debugging
> and performance analysis.
> {quote}
> I think it would also be helpful for disaster recovery.  It may require the GC to be
more sensitive to historical information about compactions.
> Alternatively, we should start looking into high-availability NNs and bookkeeper high-performance

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message