Return-Path: X-Original-To: apmail-accumulo-notifications-archive@minotaur.apache.org Delivered-To: apmail-accumulo-notifications-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 030B810C9D for ; Thu, 2 Jan 2014 14:18:54 +0000 (UTC) Received: (qmail 70514 invoked by uid 500); 2 Jan 2014 14:18:08 -0000 Delivered-To: apmail-accumulo-notifications-archive@accumulo.apache.org Received: (qmail 70248 invoked by uid 500); 2 Jan 2014 14:17:55 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 70160 invoked by uid 99); 2 Jan 2014 14:17:53 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Jan 2014 14:17:53 +0000 Date: Thu, 2 Jan 2014 14:17:52 +0000 (UTC) From: "Eric Newton (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (ACCUMULO-1940) Data file in !METADATA differs from in memory data MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ACCUMULO-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Newton updated ACCUMULO-1940: ---------------------------------- Labels: 16_qa_bug (was: ) > Data file in !METADATA differs from in memory data > -------------------------------------------------- > > Key: ACCUMULO-1940 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1940 > Project: Accumulo > Issue Type: Bug > Components: test > Affects Versions: 1.4.0, 1.4.1, 1.4.2, 1.4.3, 1.4.4, 1.5.0 > Reporter: Josh Elser > Assignee: Eric Newton > Labels: 16_qa_bug > Fix For: 1.4.5, 1.5.1, 1.6.0 > > > Found during CI run with agitation. > Got the first two error messages 5 times (assuming in a retry on failure block): > {noformat} > Failed to do close consistency check for tablet c;79d0ab;7870a > java.lang.RuntimeException: Data file in !METADATA differ from in memory data c;79d0ab;7870a {/t-0005h1j/A0005n8k.rf=797350457 19198312, /t-0005h1j/C0005skm.rf=798078368 19322025, /t-0005h1j/C0005tet.rf=89783168 2196349, /t-0005h1j/C0005u20.rf=90979448 2227972, /t-0005h1j/F0005u0v.rf=23410023 582233, /t-0005h1j/F0005u2p.rf=21958551 547159, /t-0005h1j/F0005u3g.rf=14395121 358893} {/t-0005h1j/A0005n8k.rf=797350457 19198312, /t-0005h1j/C0005skm.rf=798078368 19322025, /t-0005h1j/C0005tet.rf=89783168 2196349, /t-0005h1j/C0005u20.rf=90979448 2227972, /t-0005h1j/F0005u2p.rf=21958551 547159, /t-0005h1j/F0005u3g.rf=14395121 358893} > at org.apache.accumulo.server.tabletserver.Tablet.closeConsistencyCheck(Tablet.java:2847) > at org.apache.accumulo.server.tabletserver.Tablet.completeClose(Tablet.java:2780) > at org.apache.accumulo.server.tabletserver.Tablet.close(Tablet.java:2658) > at org.apache.accumulo.server.tabletserver.TabletServer$UnloadTabletHandler.run(TabletServer.java:2357) > at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34) > at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47) > at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34) > at java.lang.Thread.run(Thread.java:744) > {noformat} > Then, we logged that we failed the consistency check > {noformat} > Consistency check fails, retrying java.lang.RuntimeException: Failed to do close consistency check for tablet c;79d0ab;7870a > {noformat} > In the end, we gave up and closed it anyways. > {noformat} > Tablet closed consistency check has failed for c;79d0ab;7870a giving up and closing > {noformat} > Before all of this happened, we tried to bring this tablet online after a failure on a new tserver. During the minc as part of the recovery process, we failed to get the lease on the .rf_tmp file we tried to create. We failed this a couple of times, but eventually got the tmp file we needed and the recovery process completed and we could bring the tablet online. The difference between the in-memory version and the !METADATA version was this one flushed rfile that we created during this recovery process. > The problem eventually fixed itself because the tablet was migrated to a different server and we just took what was (correctly) in the !METADATA table. > There still is an unknown issue of how we missed the flush RFile in the DatafileManager's copy. -- This message was sent by Atlassian JIRA (v6.1.5#6160)