accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Newton (JIRA)" <>
Subject [jira] [Created] (ACCUMULO-4031) consistency check failure
Date Thu, 15 Oct 2015 19:44:05 GMT
Eric Newton created ACCUMULO-4031:

             Summary: consistency check failure
                 Key: ACCUMULO-4031
             Project: Accumulo
          Issue Type: Bug
          Components: tserver
    Affects Versions: 1.6.4
         Environment: Very large production cluster
            Reporter: Eric Newton

Sorry for the lack of concrete details, but my logs are not online.

This system does a lot of bulk ingest.  When it was shut down, a few tablets complained about
inconsistency of their file list with what was in the metadata table.

I tracked one down, the others appear to be similar.

First, the inconsistency was an "extra" bulk import file in the metadata table, which was
missing from the in-memory list.

The file was attempted to bulk loaded into the tablet, but the bulk-load failed.  It failed
due to a constraint violation: the bulk transaction was no longer running.

Except, really, it was. The constraint fired during the update of the tablets' metadata. 
The server of the metadata tablet was having a (brief) connection problem with zookeeper,
which is where the bulk transaction status is stored.

The importing tablet server saw the constraint violation, and didn't add the file to the in-memory
list.  However, 2 minutes later, the bulk import was retried, and (consulting the metadata
table), it claimed the file *was* imported already.

So, in the intervening 2 minutes, somehow the update was made to the metadata tablet.

 * No splitting of either tablet occurred during this event.
 * Neither tablet was moved during this event.
 * No recovery of the metadata table took place.
 * The tablet server never reported the file imported.

I reviewed the handling of constraints, and it looks correct (despite ACCUMULO-4029).

With ACCUMULO-3327, the tablet server will not reject retries, because it does not re-consult
the metadata table.

I don't know how the mutations would be applied without the tablet server reporting the file
as loaded.

This message was sent by Atlassian JIRA

View raw message