accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (Created) (JIRA)" <>
Subject [jira] [Created] (ACCUMULO-444) Data loss possible when tablet killed immediately after recovery
Date Mon, 05 Mar 2012 23:09:57 GMT
Data loss possible when tablet killed immediately after recovery

                 Key: ACCUMULO-444
             Project: Accumulo
          Issue Type: Bug
          Components: tserver
    Affects Versions: 1.3.5
         Environment: Running random walk, continuous ingest, and agitator on 10 node cluster.
            Reporter: Keith Turner
            Assignee: Keith Turner
            Priority: Blocker
             Fix For: 1.4.0, 1.3.6

Came in after a weekend of running test to find the Shard random walk test had lost data in
its index table.  After debugging I found the following sequence of events occurred.

 * Mutation X was written to shard index on Tablet T1
 * X was minor compacted to file F1
 * Tablet server serving T1 was killed
 * When T1 came up on another tablet server, it did not know about F1

The above sequence of events indicate that the !METADATA table lost data.  So I started looking
into that, and found the following sequence of events.

 * Tablet server T1 serving METADATA tablet MT was killed
 * MT comes up on another tablet server T2
 * Mutation Y is written to MT about file F1 for tablet T1
 * Tablet server T2 is killed.
 * MT comes up in tablet server T3
 * The mutations for MT from T1 are recovered, but not from T2.. therefore Y is lost

There is code that supposed to handle this situation, but its not working... I think this
issue exist in 1.3

Data loss is not certain in this situation.  In the scenario above, when MT is loaded on T2
a minor compaction is started.  If the server is killed before this minor compaction completes
then data loss will likely occur.


This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message