cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paulo Motta (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-9830) Option to disable bloom filter in highest level of LCS sstables
Date Fri, 29 Apr 2016 16:11:13 GMT


Paulo Motta commented on CASSANDRA-9830:

There a few situations when a previously disabled top-level bloom filter needs to be reloaded:
- Anti-compaction causes previously unrepaired top-level sstable drop to L0
- Anti-compaction increases the number of levels in the repaired set (so previously top-level
repaired sstables are no longer top-level)
- disable_top_level_bloom_filter option is unset
- user changes compaction strategy to other strategy

Given that the main objective of this optimization is to reduce memory usage and rebuilding
bloom filters is quite expensive, rather than not generating (or removing) top-level bloom
filters on disk, it's more reasonable to only release bloom filters from memory while still
keeping them on disk for a potential reload in the future.

Another benefit of keeping BFs on disk is to keep most of the logic within {{LeveledCompactionStrategy}},
rather than having other sstables consumers (such as tools like {{sstablelevelreset}}) being
aware that a top-level sstable may not have it's bloom filter component if this option is
enabled to deal with it accordingly.

One caveat is that when a new level L is created, overlapping sstables from L-1 must have
it's bloom filter reloaded to avoid expensive seek when doing new compactions. This is automatically
done by "organic" compactions when they replace compacted sstables from L-1. Since anti-compactions
may create new-top levels in the repaired set, we must explicitly check for overlapping sstables
in lower levels to reload their bloom filters if necessary. In order to avoid doing this more
expensive overlap check for every sstable added, I modified the compaction manager to always
use the bulk add method (addSSTables) (which is overridden by {{LeveledCompactionStrategy}})
so we can perform this check fewer times (specially when doing anti-compaction).

I rebased and added unit tests to cover edge cases mentioned above.


Also resubmitted cstar_perf tests to make sure we're getting consistent results (will post
results later):
* [majors|]
* [minors|]
* [repair|]

> Option to disable bloom filter in highest level of LCS sstables
> ---------------------------------------------------------------
>                 Key: CASSANDRA-9830
>                 URL:
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Compaction
>            Reporter: Jonathan Ellis
>            Assignee: Paulo Motta
>            Priority: Minor
>              Labels: performance
>             Fix For: 3.x
> We expect about 90% of data to be in the highest level of LCS in a fully populated series.
 (See also CASSANDRA-9829.)
> Thus if the user is primarily asking for data (partitions) that has actually been inserted,
the bloom filter on the highest level only helps reject sstables about 10% of the time.
> We should add an option that suppresses bloom filter creation on top-level sstables.
 This will dramatically reduce memory usage for LCS and may even improve performance as we
no longer check a low-value filter.
> (This is also an idea from RocksDB.)

This message was sent by Atlassian JIRA

View raw message