Return-Path: X-Original-To: apmail-incubator-accumulo-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-accumulo-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AB00998B8 for ; Mon, 5 Mar 2012 23:10:21 +0000 (UTC) Received: (qmail 76318 invoked by uid 500); 5 Mar 2012 23:10:20 -0000 Delivered-To: apmail-incubator-accumulo-dev-archive@incubator.apache.org Received: (qmail 76279 invoked by uid 500); 5 Mar 2012 23:10:20 -0000 Mailing-List: contact accumulo-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: accumulo-dev@incubator.apache.org Delivered-To: mailing list accumulo-dev@incubator.apache.org Received: (qmail 76000 invoked by uid 99); 5 Mar 2012 23:10:20 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Mar 2012 23:10:20 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Mar 2012 23:10:18 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 48DCCAC81 for ; Mon, 5 Mar 2012 23:09:57 +0000 (UTC) Date: Mon, 5 Mar 2012 23:09:57 +0000 (UTC) From: "Keith Turner (Created) (JIRA)" To: accumulo-dev@incubator.apache.org Message-ID: <1173611106.24514.1330988997299.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Created] (ACCUMULO-444) Data loss possible when tablet killed immediately after recovery MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org Data loss possible when tablet killed immediately after recovery ---------------------------------------------------------------- Key: ACCUMULO-444 URL: https://issues.apache.org/jira/browse/ACCUMULO-444 Project: Accumulo Issue Type: Bug Components: tserver Affects Versions: 1.3.5 Environment: Running random walk, continuous ingest, and agitator on 10 node cluster. Reporter: Keith Turner Assignee: Keith Turner Priority: Blocker Fix For: 1.4.0, 1.3.6 Came in after a weekend of running test to find the Shard random walk test had lost data in its index table. After debugging I found the following sequence of events occurred. * Mutation X was written to shard index on Tablet T1 * X was minor compacted to file F1 * Tablet server serving T1 was killed * When T1 came up on another tablet server, it did not know about F1 The above sequence of events indicate that the !METADATA table lost data. So I started looking into that, and found the following sequence of events. * Tablet server T1 serving METADATA tablet MT was killed * MT comes up on another tablet server T2 * Mutation Y is written to MT about file F1 for tablet T1 * Tablet server T2 is killed. * MT comes up in tablet server T3 * The mutations for MT from T1 are recovered, but not from T2.. therefore Y is lost There is code that supposed to handle this situation, but its not working... I think this issue exist in 1.3 Data loss is not certain in this situation. In the scenario above, when MT is loaded on T2 a minor compaction is started. If the server is killed before this minor compaction completes then data loss will likely occur. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira