cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kurt Greaves (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-11163) Summaries are needlessly rebuilt when the BF FP ratio is changed
Date Tue, 06 Mar 2018 00:01:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-11163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16387002#comment-16387002
] 

Kurt Greaves commented on CASSANDRA-11163:
------------------------------------------

Correct. As I noted previously 

bq. Only regenerate and persist the bloomfilter when it's missing - not when it has changed.
This means we rely on compactions/upgradesstables to update the bloomfilter.
bq. There's definitely no reason to regenerate Summaries in this case, and as previously mentioned
it's not great regenerating the bloomfilter unless you're going to persist it. I have added
persistence for the bloomfilter (when it is regenerated), however I think it's a bad idea
to do this on startup as it will likely be more time consuming than regenerating the summaries.

So the previous behaviour was to regenerate the BF in this case but *not* persist it on the
next startup (this meant it would happen on every startup until compactions/upgrades had occured).
The summaries would be regenerated and persisted on the next startup (pointlessly). Both of
these things would slow startup time pretty significantly depending on how much data you had.

The new behaviour would be to avoid regenerating BF/Summaries at all on startup and instead
rely on upgradesstables/compactions to update them. Summaries would only be recreated when
necessary (when not loaded/corrupt/missing).

In trunk it might make sense to also add a nodetool command that will allow us to regenerate
the bloomfilters/summaries/etc without re-writing the whole data file.

> Summaries are needlessly rebuilt when the BF FP ratio is changed
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-11163
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11163
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Kurt Greaves
>            Priority: Major
>             Fix For: 3.0.x, 3.11.x, 4.x
>
>
> This is from trunk, but I also saw this happen on 2.0:
> Before:
> {noformat}
> root@bw-1:/srv/cassandra# ls -ltr /var/lib/cassandra/data/keyspace1/standard1-071efdc0d11811e590c3413ee28a6c90/
> total 221460
> drwxr-xr-x 2 root root      4096 Feb 11 23:34 backups
> -rw-r--r-- 1 root root        80 Feb 11 23:50 ma-6-big-TOC.txt
> -rw-r--r-- 1 root root     26518 Feb 11 23:50 ma-6-big-Summary.db
> -rw-r--r-- 1 root root     10264 Feb 11 23:50 ma-6-big-Statistics.db
> -rw-r--r-- 1 root root   2607705 Feb 11 23:50 ma-6-big-Index.db
> -rw-r--r-- 1 root root    192440 Feb 11 23:50 ma-6-big-Filter.db
> -rw-r--r-- 1 root root        10 Feb 11 23:50 ma-6-big-Digest.crc32
> -rw-r--r-- 1 root root  35212125 Feb 11 23:50 ma-6-big-Data.db
> -rw-r--r-- 1 root root      2156 Feb 11 23:50 ma-6-big-CRC.db
> -rw-r--r-- 1 root root        80 Feb 11 23:50 ma-7-big-TOC.txt
> -rw-r--r-- 1 root root     26518 Feb 11 23:50 ma-7-big-Summary.db
> -rw-r--r-- 1 root root     10264 Feb 11 23:50 ma-7-big-Statistics.db
> -rw-r--r-- 1 root root   2607614 Feb 11 23:50 ma-7-big-Index.db
> -rw-r--r-- 1 root root    192432 Feb 11 23:50 ma-7-big-Filter.db
> -rw-r--r-- 1 root root         9 Feb 11 23:50 ma-7-big-Digest.crc32
> -rw-r--r-- 1 root root  35190400 Feb 11 23:50 ma-7-big-Data.db
> -rw-r--r-- 1 root root      2152 Feb 11 23:50 ma-7-big-CRC.db
> -rw-r--r-- 1 root root        80 Feb 11 23:50 ma-5-big-TOC.txt
> -rw-r--r-- 1 root root    104178 Feb 11 23:50 ma-5-big-Summary.db
> -rw-r--r-- 1 root root     10264 Feb 11 23:50 ma-5-big-Statistics.db
> -rw-r--r-- 1 root root  10289077 Feb 11 23:50 ma-5-big-Index.db
> -rw-r--r-- 1 root root    757384 Feb 11 23:50 ma-5-big-Filter.db
> -rw-r--r-- 1 root root         9 Feb 11 23:50 ma-5-big-Digest.crc32
> -rw-r--r-- 1 root root 139201355 Feb 11 23:50 ma-5-big-Data.db
> -rw-r--r-- 1 root root      8508 Feb 11 23:50 ma-5-big-CRC.db
> root@bw-1:/srv/cassandra# md5sum /var/lib/cassandra/data/keyspace1/standard1-071efdc0d11811e590c3413ee28a6c90/ma-5-big-Summary.db
> 5fca154fc790f7cfa37e8ad6d1c7552c
> {noformat}
> BF ratio changed, node restarted:
> {noformat}
> root@bw-1:/srv/cassandra# ls -ltr /var/lib/cassandra/data/keyspace1/standard1-071efdc0d11811e590c3413ee28a6c90/
> total 242168
> drwxr-xr-x 2 root root      4096 Feb 11 23:34 backups
> -rw-r--r-- 1 root root        80 Feb 11 23:50 ma-6-big-TOC.txt
> -rw-r--r-- 1 root root     10264 Feb 11 23:50 ma-6-big-Statistics.db
> -rw-r--r-- 1 root root   2607705 Feb 11 23:50 ma-6-big-Index.db
> -rw-r--r-- 1 root root    192440 Feb 11 23:50 ma-6-big-Filter.db
> -rw-r--r-- 1 root root        10 Feb 11 23:50 ma-6-big-Digest.crc32
> -rw-r--r-- 1 root root  35212125 Feb 11 23:50 ma-6-big-Data.db
> -rw-r--r-- 1 root root      2156 Feb 11 23:50 ma-6-big-CRC.db
> -rw-r--r-- 1 root root        80 Feb 11 23:50 ma-7-big-TOC.txt
> -rw-r--r-- 1 root root     10264 Feb 11 23:50 ma-7-big-Statistics.db
> -rw-r--r-- 1 root root   2607614 Feb 11 23:50 ma-7-big-Index.db
> -rw-r--r-- 1 root root    192432 Feb 11 23:50 ma-7-big-Filter.db
> -rw-r--r-- 1 root root         9 Feb 11 23:50 ma-7-big-Digest.crc32
> -rw-r--r-- 1 root root  35190400 Feb 11 23:50 ma-7-big-Data.db
> -rw-r--r-- 1 root root      2152 Feb 11 23:50 ma-7-big-CRC.db
> -rw-r--r-- 1 root root        80 Feb 11 23:50 ma-5-big-TOC.txt
> -rw-r--r-- 1 root root     10264 Feb 11 23:50 ma-5-big-Statistics.db
> -rw-r--r-- 1 root root  10289077 Feb 11 23:50 ma-5-big-Index.db
> -rw-r--r-- 1 root root    757384 Feb 11 23:50 ma-5-big-Filter.db
> -rw-r--r-- 1 root root         9 Feb 11 23:50 ma-5-big-Digest.crc32
> -rw-r--r-- 1 root root 139201355 Feb 11 23:50 ma-5-big-Data.db
> -rw-r--r-- 1 root root      8508 Feb 11 23:50 ma-5-big-CRC.db
> -rw-r--r-- 1 root root        80 Feb 12 00:03 ma-8-big-TOC.txt
> -rw-r--r-- 1 root root     14902 Feb 12 00:03 ma-8-big-Summary.db
> -rw-r--r-- 1 root root     10264 Feb 12 00:03 ma-8-big-Statistics.db
> -rw-r--r-- 1 root root   1458631 Feb 12 00:03 ma-8-big-Index.db
> -rw-r--r-- 1 root root     10808 Feb 12 00:03 ma-8-big-Filter.db
> -rw-r--r-- 1 root root        10 Feb 12 00:03 ma-8-big-Digest.crc32
> -rw-r--r-- 1 root root  19660275 Feb 12 00:03 ma-8-big-Data.db
> -rw-r--r-- 1 root root      1204 Feb 12 00:03 ma-8-big-CRC.db
> -rw-r--r-- 1 root root     26518 Feb 12 00:04 ma-7-big-Summary.db
> -rw-r--r-- 1 root root     26518 Feb 12 00:04 ma-6-big-Summary.db
> -rw-r--r-- 1 root root    104178 Feb 12 00:04 ma-5-big-Summary.db
> root@bw-1:/srv/cassandra# md5sum /var/lib/cassandra/data/keyspace1/standard1-071efdc0d11811e590c3413ee28a6c90/ma-5-big-Summary.db

> 5fca154fc790f7cfa37e8ad6d1c7552c 
> {noformat}
> This hurts startup time and appears to do nothing useful whatsoever.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message