cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paulo Motta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9830) Option to disable bloom filter in highest level of LCS sstables
Date Tue, 23 Feb 2016 01:43:18 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158111#comment-15158111
] 

Paulo Motta commented on CASSANDRA-9830:
----------------------------------------

Thanks for the review! Follow-up below:

bq. The call ordering in SSTableReader.releaseBloomFilter is wrong; it's causing us not to
reclaim the memory used by the bloom filter. We're currently releasing the newly created AlwaysPresentFilter
instead of the previous bloom filter.

right, this was probably the reason the minor compaction cstar test was not behaving correctly.
resubmitted cstar tests with fixed version:

[majors|http://cstar.datastax.com/tests/id/dd0ac73c-d9c1-11e5-98e3-0256e416528f]
[minors|http://cstar.datastax.com/tests/id/e228d34e-d9c1-11e5-98e3-0256e416528f]
[lower bfp|http://cstar.datastax.com/tests/id/1a2e98d2-d9c2-11e5-98e3-0256e416528f]
[inc repair|http://cstar.datastax.com/tests/id/4d3defa2-d9c2-11e5-98e3-0256e416528f]

bq. It looks like your new branch is skip_top_level_bloom_filter_volatile; can you run the
unit and dtests and add links here?

Latest branch and new tests below:
||trunk||
|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:9830-trunk]|
|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-9830-trunk-testall/lastCompletedBuild/testReport/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-9830-trunk-dtest/lastCompletedBuild/testReport/]|

bq. We should be able to use LeveledCompactionStrategy.addSSTable and {{LeveledCompactionStrategy.startup
instead of adding a new call into the ACS hierarchy. Also, this way we know which sstable
we have just added, instead of having to iterate over all of the ones in the top level. We
just need to add a check to isActive in addSSTable to make sure we aren't just now adding
all sstables (and not necessarily looking at the right levels), and that we look at all of
the top level in startup.

I used that approach initially but then I changed it to protect against the scenario where
{{replaceSSTables}} introduces a new level, so if any other level is added before the highest
level it could have its bloom filter cleared (for instance if all sstables on L0, and then
I run a major compaction). But I guess I can do that on {{LeveledManifest.replace}} instead
of {{CompactionStrategyManager.handleNotification()}}, so I modified the implementation do
that instead.

I added a new option {{batchAdd}} to {{LeveledManifest.add}} that does not clear bloom filters
when set. This option is set by {{CompactionStrategyManager.startup()}} that does not need
to clear bloom filters and by {{LeveledManifest.replace}}, that performs the bloom filter
clearing after adding all sstables. I also added the test {{testDisableTopLevelBloomFilterMajorCompactionOnL0}}
to test the scenario described before.

bq. It seems a little unintuitive that changing skip_top_level_bloom_filter from true to false
will not cause the top level to have bloom filters – the other way is true, though (you
delete all the top level bloom filters when going to skip bf). We should probably generate
a client-side warning when trying to make that change, just so users are aware it won't be
regenerated until compaction regenerates the bfs.

With the new implementation either true -> false or false -> true will only apply to
newly compacted sstables, similar to other options so I don't think the warning is necessary.
In any case, I added a remark to the documentation saying that the option will only apply
to newly compacted sstables.

bq. It looks like we use instanceof AlwaysPresentFilter in CompactionController; we should
consolidate that and IFilter.isAlwaysPresent().

fixed, thanks!

> Option to disable bloom filter in highest level of LCS sstables
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-9830
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9830
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Compaction
>            Reporter: Jonathan Ellis
>            Assignee: Paulo Motta
>            Priority: Minor
>              Labels: performance
>             Fix For: 3.x
>
>
> We expect about 90% of data to be in the highest level of LCS in a fully populated series.
 (See also CASSANDRA-9829.)
> Thus if the user is primarily asking for data (partitions) that has actually been inserted,
the bloom filter on the highest level only helps reject sstables about 10% of the time.
> We should add an option that suppresses bloom filter creation on top-level sstables.
 This will dramatically reduce memory usage for LCS and may even improve performance as we
no longer check a low-value filter.
> (This is also an idea from RocksDB.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message