cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Haddad <>
Subject Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values
Date Fri, 04 Jan 2019 19:16:50 GMT
Those are two different cases though.  It *sounds like* (again, I may be
missing the point) you're trying to overwrite a value with another value.
You're either going to serialize a blob and overwrite a single cell, or
you're going to overwrite all the cells and include a tombstone.

When you do a read, reading a single tombstone vs a single vs is
essentially the same thing, performance wise.

In your description you said "~ 20-100 events", and you're overwriting the
event each time, so I don't know how you go to 10K tombstones either.
Compaction will bring multiple tombstones together for a cell in the same
way it compacts multiple values for a single cell.

I sounds to make like you're taking some advice about tombstones out of
context and trying to apply the advice to a different problem.  Again, I
might be misunderstanding what you're doing.

On Fri, Jan 4, 2019 at 10:49 AM Tomas Bartalos <>

> Hello Jon,
> I thought having tombstones is much higher overhead than just overwriting
> values. The compaction overhead can be l similar, but I think the read
> performance is much worse.
> Tombstones accumulate and hang for 10 days (by default) before they are
> eligible for compaction.
> Also we have tombstone warning and error thresholds. If cassandra scans
> more than 10 000 tombstones, she will abort the query.
> According to this article:
> "The cassandra.yaml comments explain in perfectly: *“When executing a
> scan, within or across a partition, we need to keep the tombstones seen in
> memory so we can return them to the coordinator, which will use them to
> make sure other replicas also know about the deleted rows. With workloads
> that generate a lot of tombstones, this can cause performance problems and
> even exhaust the server heap. "*
> Regards,
> Tomas
> On Fri, 4 Jan 2019, 7:06 pm Jonathan Haddad < wrote:
>> If you're overwriting values, it really doesn't matter much if it's a
>> tombstone or any other value, they still need to be compacted and have the
>> same overhead at read time.
>> Tombstones are problematic when you try to use Cassandra as a queue (or
>> something like a queue) and you need to scan over thousands of tombstones
>> in order to get to the real data.  You're simply overwriting a row and
>> trying to avoid a single tombstone.
>> Maybe I'm missing something here.  Why do you think overwriting a single
>> cell with a tombstone is any worse than overwriting a single cell with a
>> value?
>> Jon
>> On Fri, Jan 4, 2019 at 9:57 AM Tomas Bartalos <>
>> wrote:
>>> Hello,
>>> I beleive your approach is the same as using spark with "
>>> spark.cassandra.output.ignoreNulls=true"
>>> This will not cover the situation when a value have to be overwriten
>>> with null.
>>> I found one possible solution - change the schema to keep only primary
>>> key fields and move all other fields to frozen UDT.
>>> create table (year, month, day, id, frozen<Event>, primary key((year,
>>> month, day), id) )
>>> In this way anything that is null inside event doesn't create tombstone,
>>> since event is serialized to BLOB.
>>> The penalty is in need of deserializing the whole Event when selecting
>>> only few columns.
>>> Can anyone confirm if this is good solution performance wise?
>>> Thank you,
>>> On Fri, 4 Jan 2019, 2:20 pm DuyHai Doan < wrote:
>>>> "The problem is I can't know the combination of set/unset values" -->
>>>> Just for this requirement, Achilles has a working solution for many years
>>>> using INSERT_NOT_NULL_FIELDS strategy:
>>>> Or you can use the Update API that by design only perform update on not
>>>> null fields:
>>>> Behind the scene, for each new combination of INSERT INTO table(x,y,z)
>>>> statement, Achilles will check its prepared statement cache and if the
>>>> statement does not exist yet, create a new prepared statement and put it
>>>> into the cache for later re-use for you
>>>> Disclaiment: I'm the creator of Achilles
>>>> On Thu, Dec 27, 2018 at 10:21 PM Tomas Bartalos <
>>>>> wrote:
>>>>> Hello,
>>>>> The problem is I can't know the combination of set/unset values. From
>>>>> my perspective every value should be set. The event from Kafka represents
>>>>> the complete state of the happening at certain point in time. In my table
>>>>> want to store the latest event so the most recent state of the happening
>>>>> (in this table I don't care about the history). Actually I used wrong
>>>>> expression since its just the opposite of "incremental update", every
>>>>> carries all data (state) for specific point of time.
>>>>> The event is represented with nested json structure. Top level
>>>>> elements of the json are table fields with type like text, boolean,
>>>>> timestamp, list and the nested elements are UDT fields.
>>>>> Simplified example:
>>>>> There is a new purchase for the happening, event:
>>>>> {total_amount: 50, items : [A, B, C, new_item], purchase_time :
>>>>> '2018-12-27 13:30', specials: null, customer : {... }, fare_amount,...}
>>>>> I don't know what actually happened for this event, maybe there is a
>>>>> new item purchased, maybe some customer info have been changed, maybe
>>>>> specials have been revoked and I have to reset them. I just need to store
>>>>> the state as it artived from Kafka, there might already be an event for
>>>>> this happening saved before, or maybe this is the first one.
>>>>> BR,
>>>>> Tomas
>>>>> On Thu, 27 Dec 2018, 9:36 pm Eric Stevens < wrote:
>>>>>> Depending on the use case, creating separate prepared statements
>>>>>> each combination of set / unset values in large INSERT/UPDATE statements
>>>>>> may be prohibitive.
>>>>>> Instead, you can look into driver level support for UNSET values.
>>>>>> Requires Cassandra 2.2 or later IIRC.
>>>>>> See:
>>>>>> Java Driver:
>>>>>> Python Driver:
>>>>>> Node Driver:
>>>>>> On Thu, Dec 27, 2018 at 3:21 PM Durity, Sean R <
>>>>>>> wrote:
>>>>>>> You say the events are incremental updates. I am interpreting
>>>>>>> to mean only some columns are updated. Others should keep their
>>>>>>> values.
>>>>>>> You are correct that inserting null creates a tombstone.
>>>>>>> Can you only insert the columns that actually have new values?
>>>>>>> skip the columns with no information. (Make the insert generator
a bit
>>>>>>> smarter.)
>>>>>>> Create table happening (id text primary key, event text, a text,
>>>>>>> text, c text);
>>>>>>> Insert into table happening (id, event, a, b, c) values
>>>>>>> ("MainEvent","The most complete info we have right now","Priceless","10
>>>>>>> pm","Grand Ballroom");
>>>>>>> -- b changes
>>>>>>> Insert into happening (id, b) values ("MainEvent","9:30 pm");
>>>>>>> Sean Durity
>>>>>>> -----Original Message-----
>>>>>>> From: Tomas Bartalos <>
>>>>>>> Sent: Thursday, December 27, 2018 9:27 AM
>>>>>>> To:
>>>>>>> Subject: [EXTERNAL] Howto avoid tombstones when inserting NULL
>>>>>>> Hello,
>>>>>>> I’d start with describing my use case and how I’d like to
>>>>>>> Cassandra to solve my storage needs.
>>>>>>> We're processing a stream of events for various happenings. Every
>>>>>>> event have a unique happening_id.
>>>>>>> One happening may have many events, usually ~ 20-100 events.
>>>>>>> like to store only the latest event for the same happening (Event
is an
>>>>>>> incremental update and it contains all up-to date data about
>>>>>>> Technically the events are streamed from Kafka, processed with
>>>>>>> an saved to Cassandra.
>>>>>>> In Cassandra we use upserts (insert with same primary key). 
So far
>>>>>>> so good, however there comes the tombstone...
>>>>>>> When I’m inserting field with NULL value, Cassandra creates
>>>>>>> tombstone for this field. As I understood this is due to space
>>>>>>> Cassandra doesn’t have to remember there is a NULL value, she
just deletes
>>>>>>> the respective column and a delete creates a ... tombstone.
>>>>>>> I was hoping there could be an option to tell Cassandra not to
be so
>>>>>>> space effective and store “unset" info without generating tombstones.
>>>>>>> Something similar to inserting empty strings instead of null
>>>>>>> CREATE TABLE happening (id text PRIMARY KEY, event text); insert
>>>>>>> into happening (‘1’, ‘event1’); — tombstone is generated
insert into
>>>>>>> happening (‘1’, null); — tombstone is not generated insert
into happening
>>>>>>> (‘1’, '’);
>>>>>>> Possible solutions:
>>>>>>> 1. Disable tombstones with gc_grace_seconds = 0 or set to reasonable
>>>>>>> low value (1 hour ?) . Not good, since phantom data may re-appear
2. ignore
>>>>>>> NULLs on spark side with “spark.cassandra.output.ignoreNulls=true”.
>>>>>>> good since this will never overwrite previously inserted event
field with
>>>>>>> “empty” one.
>>>>>>> 3. On inserts with spark, find all NULL values and replace them
>>>>>>> “empty” equivalent (empty string for text, 0 for integer).
Very inefficient
>>>>>>> and problematic to find “empty” equivalent for some data
>>>>>>> Until tombstones appeared Cassandra was the right fit for our
>>>>>>> case, however now I’m not sure if we’re heading the right
>>>>>>> Could you please give me some advice how to solve this problem
>>>>>>> Thank you,
>>>>>>> Tomas
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail:
>>>>>>> For additional commands, e-mail:
>>>>>>> ________________________________
>>>>>>> The information in this Internet Email is confidential and may
>>>>>>> legally privileged. It is intended solely for the addressee.
Access to this
>>>>>>> Email by anyone else is unauthorized. If you are not the intended
>>>>>>> recipient, any disclosure, copying, distribution or any action
taken or
>>>>>>> omitted to be taken in reliance on it, is prohibited and may
be unlawful.
>>>>>>> When addressed to our clients any opinions or advice contained
in this
>>>>>>> Email are subject to the terms and conditions expressed in any
>>>>>>> governing The Home Depot terms of business or client engagement
letter. The
>>>>>>> Home Depot disclaims all responsibility and liability for the
accuracy and
>>>>>>> content of this attachment and for any damages or losses arising
from any
>>>>>>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc.,
or other
>>>>>>> items of a destructive nature, which may be contained in this
>>>>>>> and shall not be liable for direct, indirect, consequential or
>>>>>>> damages in connection with this e-mail message or its attachment.
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail:
>>>>>>> For additional commands, e-mail:
>> --
>> Jon Haddad
>> twitter: rustyrazorblade

Jon Haddad
twitter: rustyrazorblade

View raw message