accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Newton (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-4000) log recovery failed after hard reset
Date Wed, 16 Sep 2015 16:02:46 GMT


Eric Newton commented on ACCUMULO-4000:

Yes, I think so. The file had a block allocated to it and it was 93 bytes in length (how the
NN knew this, I have not yet determined), and no copy of the block on any other server.  I
could not copy the file locally.  The first issue was that the lease release failed.

I should mention that several racks were being decommissioned, so there was a lot of block
management at the time.

> log recovery failed after hard reset
> ------------------------------------
>                 Key: ACCUMULO-4000
>                 URL:
>             Project: Accumulo
>          Issue Type: Bug
>    Affects Versions: 1.6.2
>         Environment: very large cluster, accumulo 1.6.2, hadoop 2.5.0 (cdh 5.3)
>            Reporter: Eric Newton
>            Assignee: Eric Newton
> Had a hardware failure on a single node within a large cluster.  Tablets were migrated
away, but one tablet would not recover.  The Closer run by the master to release the write
lease on the WAL failed repeatedly.
> Afterwards, it was determined the file was small, probably just opened and used at the
moment the machine failed.  The block could not be recovered from any replicas.
> One question raised: does the write pipeline acknowledge the sync, before the write pipeline

This message was sent by Atlassian JIRA

View raw message