Return-Path: X-Original-To: apmail-accumulo-notifications-archive@minotaur.apache.org Delivered-To: apmail-accumulo-notifications-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 03CDB11C35 for ; Thu, 18 Sep 2014 15:14:35 +0000 (UTC) Received: (qmail 57685 invoked by uid 500); 18 Sep 2014 15:14:34 -0000 Delivered-To: apmail-accumulo-notifications-archive@accumulo.apache.org Received: (qmail 57640 invoked by uid 500); 18 Sep 2014 15:14:34 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 57623 invoked by uid 99); 18 Sep 2014 15:14:34 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Sep 2014 15:14:34 +0000 Date: Thu, 18 Sep 2014 15:14:34 +0000 (UTC) From: "Keith Turner (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (ACCUMULO-3140) Compaction did not run during RW test MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ACCUMULO-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14139042#comment-14139042 ] Keith Turner edited comment on ACCUMULO-3140 at 9/18/14 3:14 PM: ----------------------------------------------------------------- I was thinking about how to fix this. A very simple fix would be the following change in {{Tablet.compactAll()}}. However this change could lead to starvation of a user initiated major compaction in the case where flushes/minor compactions are constantly running on a tablet. Also user initiated major compactons can optionally flush, if the compact operation is not flushing then should probably not wait here. {code} diff --git a/server/src/main/java/org/apache/accumulo/server/tabletserver/Tablet.java b/server/src/main/java/org/apache/accumulo/server/tabletserver/Tablet.java index bb13ff8..13e4292 100644 --- a/server/src/main/java/org/apache/accumulo/server/tabletserver/Tablet.java +++ b/server/src/main/java/org/apache/accumulo/server/tabletserver/Tablet.java @@ -3901,7 +3901,7 @@ public class Tablet { if (lastCompactID >= compactionId) return; - if (closing || closed || majorCompactionQueued.contains(MajorCompactionReason.USER) || majorCompactionInProgress) + if (closing || closed || majorCompactionQueued.contains(MajorCompactionReason.USER) || majorCompactionInProgress || minorCompactionInProgress) return; if (datafileManager.getDatafileSizes().size() == 0) { {code} I showed this to [~ctubbsii] and he suggested checking the flushId in compact all. I like this approach, but it would require changing RPC between master and tserver. Would also require changing the compact FATE op to acquire, persist, and pass the flush id if needed. May also require changes in RPC between client and master, so client can indicate if compact should wait for flush. This seems like a nice change for 1.7.0, but given all of the RPC changes maybe not a good change for 1.5 and 1.6. was (Author: kturner): I was thinking about how to fix this. A very simple fix would be the following change in {{Tablet.compactAll()}}. However this change could lead to starvation of a user initiated major compaction in the case where flushes/minor compactions are constantly running on a tablet. Also user initiated major compactons can optionally flush, if the compact operation is not flushing then should probably not wait here. {code:patch} diff --git a/server/src/main/java/org/apache/accumulo/server/tabletserver/Tablet.java b/server/src/main/java/org/apache/accumulo/server/tabletserver/Tablet.java index bb13ff8..13e4292 100644 --- a/server/src/main/java/org/apache/accumulo/server/tabletserver/Tablet.java +++ b/server/src/main/java/org/apache/accumulo/server/tabletserver/Tablet.java @@ -3901,7 +3901,7 @@ public class Tablet { if (lastCompactID >= compactionId) return; - if (closing || closed || majorCompactionQueued.contains(MajorCompactionReason.USER) || majorCompactionInProgress) + if (closing || closed || majorCompactionQueued.contains(MajorCompactionReason.USER) || majorCompactionInProgress || minorCompactionInProgress) return; if (datafileManager.getDatafileSizes().size() == 0) { {code} I showed this to [~ctubbsii] and he suggested checking the flushId in compact all. I like this approach, but it would require changing RPC between master and tserver. Would also require changing the compact FATE op to acquire, persist, and pass the flush id if needed. May also require changes in RPC between client and master, so client can indicate if compact should wait for flush. This seems like a nice change for 1.7.0, but given all of the RPC changes maybe not a good change for 1.5 and 1.6. > Compaction did not run during RW test > ------------------------------------- > > Key: ACCUMULO-3140 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3140 > Project: Accumulo > Issue Type: Bug > Affects Versions: 1.5.0, 1.5.1, 1.5.2, 1.6.0 > Environment: 1.5.2 RC1, Hadoop 2.3.0, Zookeeper 3.4.5, CentOS 6, 20 node EC2 > Reporter: Keith Turner > Assignee: Keith Turner > Fix For: 1.5.3, 1.6.1, 1.7.0 > > > Saw the following failure while running RW test against 1.5.2 RC1 > {noformat} > java.lang.Exception: Error running node Shard.xml > at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:285) > at org.apache.accumulo.test.randomwalk.Framework.run(Framework.java:63) > at org.apache.accumulo.test.randomwalk.Framework.main(Framework.java:122) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.accumulo.start.Main$1.run(Main.java:107) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.Exception: Error running node Verify > at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:285) > at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:254) > ... 8 more > Caused by: java.lang.Exception: index rebuild mismatch 000050 100z:bda1000000000000 [] 1410899561685 false 000050 100z:9d20000000000000 [] 1410892435393 false ST_index_ip_10_1_2_29_ec2_internal_3328_1410892364707 ST_index_ip_10_1_2_29_ec2_internal_3328_1410892364707_tmp > at org.apache.accumulo.test.randomwalk.shard.VerifyIndex.visit(VerifyIndex.java:55) > at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:254) > ... 9 more > {noformat} > Determined that document ID {{9d20000000000000}} existed in the index, but not the document table. I found in the RW logs that a filtering compaction with the pattern {noformat}^[0-9a-f][d].*{noformat} should have removed this document from the index. However, the compaction did not run on the relevant tablet {{1w;000050;00004c}}. The test shortly after ran a filtering compaction with the pattern {noformat}^[0-9a-f][1].*{noformat}, which did cause a corresponding compaction. Below are the tserver and RW logs interleaved by time. Document {{9d20000000000000}} was indexed in shard {{000050}}. > {noformat} > TSERVER 2014-09-16 18:32:50,125 [tabletserver.Tablet] TABLET_HIST: 1w<;00004c split 1w;000050;00004c 1w<;000050 > TSERVER 2014-09-16 18:32:50,126 [tabletserver.Tablet] TABLET_HIST: 1w;000050;00004c opened > TSERVER 2014-09-16 18:32:57,288 [tabletserver.TabletServer] INFO : Adding 1 logs for extent 1w;000050;00004c as alias 187 > RWLOG 16 18:33:55,294 [shard.Insert] DEBUG: Inserted document 9d20000000000000 > TSERVER 2014-09-16 18:35:02,985 [tabletserver.MinorCompactor] DEBUG: Begin minor compaction /accumulo/tables/1w/t-00001mf/F0000476.rf_tmp 1w;000050;00004c > TSERVER 2014-09-16 18:35:04,049 [tabletserver.Compactor] DEBUG: Compaction 1w;000050;00004c 83,164 read | 81,599 written | 128,936 entries/sec | 0.645 secs > TSERVER 2014-09-16 18:35:04,053 [tabletserver.Tablet] DEBUG: Logs for memory compacted: 1w;000050;00004c 10.1.2.26+9997/1bf8ebed-e73e-460b-b54f-0b29b3d3c19c > TSERVER 2014-09-16 18:35:04,501 [tabletserver.Tablet] TABLET_HIST: 1w;000050;00004c MinC [memory] -> /t-00001mf/F0000476.rf > TSERVER 2014-09-16 18:35:04,501 [tabletserver.Tablet] DEBUG: MinC finish lock 0.00 secs 1w;000050;00004c > RWLOG 16 18:35:14,641 [shard.CompactFilter] DEBUG: Filtered documents using compaction iterators ^[0-9a-f][d].* 32451 19802 > TSERVER 2014-09-16 18:35:41,433 [tabletserver.Tablet] DEBUG: Starting MajC 1w;000050;00004c (USER) [/t-00001mf/F0000476.rf] --> /t-00001mf/A000048e.rf_tmp [name:RegExFilter, priority:21, class:org.apache.accumulo.core.iterators.user.RegExFilter, properties:{matchSubstring=false, negate=true, colqRegex=^[0-9a-f][1].*, orFields=false}] > TSERVER 2014-09-16 18:35:41,960 [tabletserver.Compactor] DEBUG: Compaction 1w;000050;00004c 81,599 read | 73,110 written | 187,583 entries/sec | 0.435 secs > TSERVER 2014-09-16 18:35:42,079 [tabletserver.Tablet] TABLET_HIST: 1w;000050;00004c MajC [/t-00001mf/F0000476.rf] --> /t-00001mf/A000048e.rf > RWLOG 16 18:35:43,854 [shard.CompactFilter] DEBUG: Filtered documents using compaction iterators ^[0-9a-f][1].* 18648 10103 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)