cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peer, Oded" <Oded.P...@rsa.com>
Subject RE: Inserting null values
Date Thu, 07 May 2015 06:35:46 GMT
I’ve added an option to prevent tombstone creation when using PreparedStatements to trunk,
see CASSANDRA-7304.

The problem is having tombstones in regular columns.
When you perform a read request (range query or by PK):
- Cassandra iterates over all the cells (all, not only the cells specified in the query) in
the relevant rows while counting tombstone cells (https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/filter/SliceQueryFilter.java#L199)
- creates a ColumnFamily object instance with the rows
- filters the selected columns from the internal CF (https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java#L653)
- returns the result

If you have many unnecessary tombstones you read many unnecessary cells.



From: Eric Stevens [mailto:mightye@gmail.com]
Sent: Wednesday, May 06, 2015 4:37 PM
To: user@cassandra.apache.org
Subject: Re: Inserting null values

I agree that inserting null is not as good as not inserting that column at all when you have
confidence that you are not shadowing any underlying data. But pragmatically speaking it really
doesn't sound like a small number of incidental nulls/tombstones (< 20% of columns, otherwise
CASSANDRA-3442 takes over) is going to have any performance impact either in your query patterns
or in compaction in any practical sense.

If INSERT of null values is problematic for small portions of your data, then it stands to
reason that an INSERT option containing an instruction to prevent tombstone creation would
be an important performance optimization (and would also address the fact that non-null collections
also generate tombstones on INSERT as well).  INSERT INTO ... USING no_tombstones;


> There's thresholds (log messages, etc.) which operate on tombstone counts over a certain
number, but not on column counts over the same number.

tombstone_warn_threshold and tombstone_failure_threshold only apply to clustering scans right?
 I.E. tombstones don't count against those thresholds if they are not part of the clustering
key column being considered for the non-EQ relation?  The documentation certainly implies
so:

tombstone_warn_threshold¶<http://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__tombstone_warn_threshold>
(Default: 1000) The maximum number of tombstones a query can scan before warning.
tombstone_failure_threshold¶<http://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__tombstone_failure_threshold>
(Default: 100000) The maximum number of tombstones a query can scan before aborting.

On Wed, Apr 29, 2015 at 12:42 PM, Robert Coli <rcoli@eventbrite.com<mailto:rcoli@eventbrite.com>>
wrote:
On Wed, Apr 29, 2015 at 9:16 AM, Eric Stevens <mightye@gmail.com<mailto:mightye@gmail.com>>
wrote:
In the end, inserting a tombstone into a non-clustered column shouldn't be appreciably worse
(if it is at all) than inserting a value instead.  Or am I missing something here?

There's thresholds (log messages, etc.) which operate on tombstone counts over a certain number,
but not on column counts over the same number.

Given that tombstones are often smaller than data columns, sorta hard to understand conceptually?

=Rob


Mime
View raw message