cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jon Haddad <...@jonhaddad.com>
Subject Re: Questions about C* performance related to tombstone
Date Tue, 09 Apr 2019 14:56:56 GMT
Normal deletes are fine.

Sadly there's a lot of hand wringing about tombstones in the generic
sense which leads people to try to work around *every* case where
they're used.  This is unnecessary.  A tombstone over a single row
isn't a problem, especially if you're only fetching that one row back.
Tombstones can be quite terrible under a few conditions:

1. When a range tombstone shadows hundreds / thousands / millions of
rows.  This wasn't even detectable prior to Cassandra 3 unless you
were either looking for it specifically or were doing CPU profiling:
http://thelastpickle.com/blog/2018/07/05/undetectable-tombstones-in-apache-cassandra.html
2. When rows were frequently created then deleted, and scanned over.
This is the queue pattern that we detest so much.
3. When they'd be created as a side effect from over writing
collections.  This is an accident typically.

The 'active' flag is good if you want to be able to go back and look
at old deleted assignments.  If you don't care about that, use a
normal delete.

Jon

On Tue, Apr 9, 2019 at 7:00 AM Li, George <guangxing.li@pearson.com> wrote:
>
> Hi,
>
> I have a table defined like this:
>
> CREATE TABLE myTable (
> course_id text,
> assignment_id text,
> assignment_item_id text,
> data text,
> boolean active,
> PRIMARY KEY (course_id, assignment_id, assignment_item_id)
> );
> i.e. course_id as the partition key and assignment_id, assignment_item_id as clustering
keys.
>
> After data is populated, some delete queries by course_id and assignment_id occurs, e.g.
"DELETE FROM myTable WHERE course_id = 'C' AND assignment_id = 'A1';". This would create tombstones
so query "SELECT * FROM myTable WHERE course_id = 'C';" would be affected, right? Would query
"SELECT * FROM myTable WHERE course_id = 'C' AND assignment_id = 'A2';" be affected too?
>
> For query "SELECT * FROM myTable WHERE course_id = 'C';", to workaround the tombstone
problem, we are thinking about not doing hard deletes, instead doing soft deletes. So instead
of doing "DELETE FROM myTable WHERE course_id = 'C' AND assignment_id = 'A1';", we do "UPDATE
myTable SET active = false WHERE course_id = 'C' AND assignment_id = 'A1';". Then in the application,
we do query "SELECT * FROM myTable WHERE course_id = 'C';" and filter out records that have
"active" equal to "false". I am not really sure this would improve performance because C*
still has to scan through all records with the partition key "C". It is just instead of scanning
through X records + Y tombstone records with hard deletes that generate tombstones, it now
scans through X + Y records with soft deletes and no tombstones. Am I right?
>
> Thanks.
>
> George

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org


Mime
View raw message