Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 90E53194B2 for ; Fri, 29 Apr 2016 16:11:14 +0000 (UTC) Received: (qmail 47273 invoked by uid 500); 29 Apr 2016 16:11:13 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 47109 invoked by uid 500); 29 Apr 2016 16:11:13 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 46680 invoked by uid 99); 29 Apr 2016 16:11:13 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Apr 2016 16:11:13 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 0FE7C2C1F6B for ; Fri, 29 Apr 2016 16:11:13 +0000 (UTC) Date: Fri, 29 Apr 2016 16:11:13 +0000 (UTC) From: "Paulo Motta (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-9830) Option to disable bloom filter in highest level of LCS sstables MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-9830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15264264#comment-15264264 ] Paulo Motta commented on CASSANDRA-9830: ---------------------------------------- There a few situations when a previously disabled top-level bloom filter needs to be reloaded: - Anti-compaction causes previously unrepaired top-level sstable drop to L0 - Anti-compaction increases the number of levels in the repaired set (so previously top-level repaired sstables are no longer top-level) - disable_top_level_bloom_filter option is unset - user changes compaction strategy to other strategy Given that the main objective of this optimization is to reduce memory usage and rebuilding bloom filters is quite expensive, rather than not generating (or removing) top-level bloom filters on disk, it's more reasonable to only release bloom filters from memory while still keeping them on disk for a potential reload in the future. Another benefit of keeping BFs on disk is to keep most of the logic within {{LeveledCompactionStrategy}}, rather than having other sstables consumers (such as tools like {{sstablelevelreset}}) being aware that a top-level sstable may not have it's bloom filter component if this option is enabled to deal with it accordingly. One caveat is that when a new level L is created, overlapping sstables from L-1 must have it's bloom filter reloaded to avoid expensive seek when doing new compactions. This is automatically done by "organic" compactions when they replace compacted sstables from L-1. Since anti-compactions may create new-top levels in the repaired set, we must explicitly check for overlapping sstables in lower levels to reload their bloom filters if necessary. In order to avoid doing this more expensive overlap check for every sstable added, I modified the compaction manager to always use the bulk add method (addSSTables) (which is overridden by {{LeveledCompactionStrategy}}) so we can perform this check fewer times (specially when doing anti-compaction). I rebased and added unit tests to cover edge cases mentioned above. ||trunk|| |[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-9830]| |[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-9830-testall/lastCompletedBuild/testReport/]| |[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-9830-dtest/lastCompletedBuild/testReport/]| Also resubmitted cstar_perf tests to make sure we're getting consistent results (will post results later): * [majors|http://cstar.datastax.com/tests/id/ddf75066-0e23-11e6-979b-0256e416528f] * [minors|http://cstar.datastax.com/tests/id/e86f087c-0e23-11e6-979b-0256e416528f] * [repair|http://cstar.datastax.com/tests/id/da385b14-0e23-11e6-979b-0256e416528f] > Option to disable bloom filter in highest level of LCS sstables > --------------------------------------------------------------- > > Key: CASSANDRA-9830 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9830 > Project: Cassandra > Issue Type: New Feature > Components: Compaction > Reporter: Jonathan Ellis > Assignee: Paulo Motta > Priority: Minor > Labels: performance > Fix For: 3.x > > > We expect about 90% of data to be in the highest level of LCS in a fully populated series. (See also CASSANDRA-9829.) > Thus if the user is primarily asking for data (partitions) that has actually been inserted, the bloom filter on the highest level only helps reject sstables about 10% of the time. > We should add an option that suppresses bloom filter creation on top-level sstables. This will dramatically reduce memory usage for LCS and may even improve performance as we no longer check a low-value filter. > (This is also an idea from RocksDB.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)