Return-Path: X-Original-To: apmail-accumulo-notifications-archive@minotaur.apache.org Delivered-To: apmail-accumulo-notifications-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 78983116D9 for ; Tue, 19 Aug 2014 21:37:19 +0000 (UTC) Received: (qmail 93223 invoked by uid 500); 19 Aug 2014 21:37:19 -0000 Delivered-To: apmail-accumulo-notifications-archive@accumulo.apache.org Received: (qmail 93188 invoked by uid 500); 19 Aug 2014 21:37:19 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 93173 invoked by uid 99); 19 Aug 2014 21:37:19 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Aug 2014 21:37:19 +0000 Date: Tue, 19 Aug 2014 21:37:19 +0000 (UTC) From: "Josh Elser (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (ACCUMULO-3067) scan performance degrades after compaction MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ACCUMULO-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102888#comment-14102888 ] Josh Elser commented on ACCUMULO-3067: -------------------------------------- bq. in your screen shot it looks like you have nearly filled DFS. That usually leads to bad things. Of the HDFS space being used, Accumulo is using 99.9% of it. This is different than HDFS being full. > scan performance degrades after compaction > ------------------------------------------ > > Key: ACCUMULO-3067 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3067 > Project: Accumulo > Issue Type: Bug > Components: tserver > Environment: Macbook Pro 2.6 GHz Intel Core i7, 16GB RAM, SSD, OSX 10.9.4, single tablet server process, single client process > Reporter: Adam Fuchs > Attachments: Screen Shot 2014-08-19 at 4.19.37 PM.png, accumulo_query_perf_test.tar.gz > > > I've been running some scan performance tests on 1.6.0, and I'm running into an interesting situation in which query performance starts at a certain level and then degrades by ~15% after an event. The test follows roughly the following scenario: > # Single tabletserver instance > # Load 100M small (~10byte) key/values into a tablet and let it finish major compacting > # Disable the garbage collector (this makes the time to _the event_ longer) > # Restart the tabletserver > # Repeatedly scan from the beginning to the end of the table in a loop > # Something happens on the tablet server, like one of {idle compaction of metadata table, forced flush of metadata table, forced compaction of metadata table, forced flush of trace table} > # Observe that scan rates dropped by 15-20% > # Observe that restarting the scan will not improve performance back to original level. Performance only gets better upon restarting the tablet server. > I've been able to get this not to happen by removing iterators from the iterator tree. It doesn't seem to matter which iterators, but removing a certain number both improves performance (significantly) and eliminates the degradation problem. The default iterator tree includes (VersioningIterator, SynchronizedIterator, VisibilityFilter, ColumnQualifierFilter, ColumnFamilySkippingIterator, DeletingIterator, StatsIterator, MultiIterator, (MemoryIterator*, RFile.Reader*)). Narrowing this to VisibilityFilter, ColumnFamilySkippingIterator, DeletingIterator, StatsIterator, MultiIterator, (MemoryIterator*, RFile.Reader*)) eliminates the weird condition. There are also other combinations that perform much better than the default. I haven't been able to isolate this problem to a single iterator, despite removing each iterator one at a time. > Anybody know what might be happening here? Best theory so far: the JVM learns that iterators can be used in a different way after a compaction, and some JVM optimization like JIT compilation, branch prediction, or automatic inlining stops happening. -- This message was sent by Atlassian JIRA (v6.2#6252)