accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (Created) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ACCUMULO-444) Data loss possible when tablet killed immediately after recovery
Date Mon, 05 Mar 2012 23:09:57 GMT
Data loss possible when tablet killed immediately after recovery
----------------------------------------------------------------

                 Key: ACCUMULO-444
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-444
             Project: Accumulo
          Issue Type: Bug
          Components: tserver
    Affects Versions: 1.3.5
         Environment: Running random walk, continuous ingest, and agitator on 10 node cluster.
            Reporter: Keith Turner
            Assignee: Keith Turner
            Priority: Blocker
             Fix For: 1.4.0, 1.3.6


Came in after a weekend of running test to find the Shard random walk test had lost data in
its index table.  After debugging I found the following sequence of events occurred.

 * Mutation X was written to shard index on Tablet T1
 * X was minor compacted to file F1
 * Tablet server serving T1 was killed
 * When T1 came up on another tablet server, it did not know about F1

The above sequence of events indicate that the !METADATA table lost data.  So I started looking
into that, and found the following sequence of events.

 * Tablet server T1 serving METADATA tablet MT was killed
 * MT comes up on another tablet server T2
 * Mutation Y is written to MT about file F1 for tablet T1
 * Tablet server T2 is killed.
 * MT comes up in tablet server T3
 * The mutations for MT from T1 are recovered, but not from T2.. therefore Y is lost

There is code that supposed to handle this situation, but its not working... I think this
issue exist in 1.3

Data loss is not certain in this situation.  In the scenario above, when MT is loaded on T2
a minor compaction is started.  If the server is killed before this minor compaction completes
then data loss will likely occur.

  


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message