accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-4000) log recovery failed after hard reset
Date Wed, 16 Sep 2015 15:53:46 GMT


Josh Elser commented on ACCUMULO-4000:

I remember I had done something here "recently" for automatically ignoring empty WALs or WALs
with corrupt headers. The assumption was that if we didn't get the sync of the header done,
there's nothing else worthwhile in the WAL to consider.

Is this unique to that case?

> log recovery failed after hard reset
> ------------------------------------
>                 Key: ACCUMULO-4000
>                 URL:
>             Project: Accumulo
>          Issue Type: Bug
>    Affects Versions: 1.6.2
>         Environment: very large cluster, accumulo 1.6.2, hadoop 2.5.0 (cdh 5.3)
>            Reporter: Eric Newton
>            Assignee: Eric Newton
> Had a hardware failure on a single node within a large cluster.  Tablets were migrated
away, but one tablet would not recover.  The Closer run by the master to release the write
lease on the WAL failed repeatedly.
> Afterwards, it was determined the file was small, probably just opened and used at the
moment the machine failed.  The block could not be recovered from any replicas.
> One question raised: does the write pipeline acknowledge the sync, before the write pipeline

This message was sent by Atlassian JIRA

View raw message