cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksey Yeschenko (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-9107) More accurate row count estimates
Date Thu, 04 Jun 2015 18:29:40 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Aleksey Yeschenko updated CASSANDRA-9107:
-----------------------------------------
    Fix Version/s:     (was: 2.1.x)
                   2.2.0 rc1
                   2.1.6

> More accurate row count estimates
> ---------------------------------
>
>                 Key: CASSANDRA-9107
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9107
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Chris Lohfink
>            Assignee: Chris Lohfink
>             Fix For: 2.1.6, 2.2.0 rc1
>
>         Attachments: 9107-cassandra2-1.patch, 9107-v2.txt
>
>
> Currently the estimated row count from cfstats is the sum of the number of rows in all
the sstables. This becomes very inaccurate with wide rows or heavily updated datasets since
the same partition would exist in many sstables.  In example:
> {code}
> create KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor':
1};
> create TABLE wide (key text PRIMARY KEY , value text) WITH compaction = {'class': 'SizeTieredCompactionStrategy',
'min_threshold': 30, 
> 'max_threshold': 100} ;
> -------------------------------
> insert INTO wide (key, value) VALUES ('key', 'value');
> // flush
> // cfstats output: Number of keys (estimate): 1  (128 in older version from index)
> insert INTO wide (key, value) VALUES ('key', 'value');
> // flush
> // cfstats output: Number of keys (estimate): 2  (256 in older version from index)
> ... etc
> {code}
> previously it used the index but it still did it per sstable and summed them up which
became inaccurate as there are more sstables (just by much worse). With new versions of sstables
we can merge the cardinalities to resolve this with a slight hit to accuracy in the case of
every sstable having completely unique partitions.
> Furthermore I think it would be pretty minimal effort to include the number of rows in
the memtables to this count. We wont have the cardinality merging between memtables and sstables
but I would consider that a relatively minor negative.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message