cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DuyHai Doan <>
Subject Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values
Date Fri, 04 Jan 2019 20:15:44 GMT
The idea of storing your data as a single blob can be dangerous.

Indeed, you loose the ability to perform atomic update on each column.

In Cassandra, LWW is the rule. Suppose 2 concurrent updates on the same
row, 1st update changes column Firstname (let's say it's a Person record)
and 2nd update changes column Lastname

Now depending on the timestamp between the 2 updates, you'll have:

- old Firstname, new Lastname
- new Firstname, old Lastname

having updates on columns atomically guarantees you to have new Firstname,
new Lastname

On Fri, Jan 4, 2019 at 8:17 PM Jonathan Haddad <> wrote:

> Those are two different cases though.  It *sounds like* (again, I may be
> missing the point) you're trying to overwrite a value with another value.
> You're either going to serialize a blob and overwrite a single cell, or
> you're going to overwrite all the cells and include a tombstone.
> When you do a read, reading a single tombstone vs a single vs is
> essentially the same thing, performance wise.
> In your description you said "~ 20-100 events", and you're overwriting the
> event each time, so I don't know how you go to 10K tombstones either.
> Compaction will bring multiple tombstones together for a cell in the same
> way it compacts multiple values for a single cell.
> I sounds to make like you're taking some advice about tombstones out of
> context and trying to apply the advice to a different problem.  Again, I
> might be misunderstanding what you're doing.
> On Fri, Jan 4, 2019 at 10:49 AM Tomas Bartalos <>
> wrote:
>> Hello Jon,
>> I thought having tombstones is much higher overhead than just overwriting
>> values. The compaction overhead can be l similar, but I think the read
>> performance is much worse.
>> Tombstones accumulate and hang for 10 days (by default) before they are
>> eligible for compaction.
>> Also we have tombstone warning and error thresholds. If cassandra scans
>> more than 10 000 tombstones, she will abort the query.
>> According to this article:
>> "The cassandra.yaml comments explain in perfectly: *“When executing a
>> scan, within or across a partition, we need to keep the tombstones seen in
>> memory so we can return them to the coordinator, which will use them to
>> make sure other replicas also know about the deleted rows. With workloads
>> that generate a lot of tombstones, this can cause performance problems and
>> even exhaust the server heap. "*
>> Regards,
>> Tomas
>> On Fri, 4 Jan 2019, 7:06 pm Jonathan Haddad < wrote:
>>> If you're overwriting values, it really doesn't matter much if it's a
>>> tombstone or any other value, they still need to be compacted and have the
>>> same overhead at read time.
>>> Tombstones are problematic when you try to use Cassandra as a queue (or
>>> something like a queue) and you need to scan over thousands of tombstones
>>> in order to get to the real data.  You're simply overwriting a row and
>>> trying to avoid a single tombstone.
>>> Maybe I'm missing something here.  Why do you think overwriting a single
>>> cell with a tombstone is any worse than overwriting a single cell with a
>>> value?
>>> Jon
>>> On Fri, Jan 4, 2019 at 9:57 AM Tomas Bartalos <>
>>> wrote:
>>>> Hello,
>>>> I beleive your approach is the same as using spark with "
>>>> spark.cassandra.output.ignoreNulls=true"
>>>> This will not cover the situation when a value have to be overwriten
>>>> with null.
>>>> I found one possible solution - change the schema to keep only primary
>>>> key fields and move all other fields to frozen UDT.
>>>> create table (year, month, day, id, frozen<Event>, primary key((year,
>>>> month, day), id) )
>>>> In this way anything that is null inside event doesn't create
>>>> tombstone, since event is serialized to BLOB.
>>>> The penalty is in need of deserializing the whole Event when selecting
>>>> only few columns.
>>>> Can anyone confirm if this is good solution performance wise?
>>>> Thank you,
>>>> On Fri, 4 Jan 2019, 2:20 pm DuyHai Doan < wrote:
>>>>> "The problem is I can't know the combination of set/unset values" -->
>>>>> Just for this requirement, Achilles has a working solution for many years
>>>>> using INSERT_NOT_NULL_FIELDS strategy:
>>>>> Or you can use the Update API that by design only perform update on
>>>>> not null fields:
>>>>> Behind the scene, for each new combination of INSERT INTO table(x,y,z)
>>>>> statement, Achilles will check its prepared statement cache and if the
>>>>> statement does not exist yet, create a new prepared statement and put
>>>>> into the cache for later re-use for you
>>>>> Disclaiment: I'm the creator of Achilles
>>>>> On Thu, Dec 27, 2018 at 10:21 PM Tomas Bartalos <
>>>>>> wrote:
>>>>>> Hello,
>>>>>> The problem is I can't know the combination of set/unset values.
>>>>>> my perspective every value should be set. The event from Kafka represents
>>>>>> the complete state of the happening at certain point in time. In
my table I
>>>>>> want to store the latest event so the most recent state of the happening
>>>>>> (in this table I don't care about the history). Actually I used wrong
>>>>>> expression since its just the opposite of "incremental update", every
>>>>>> carries all data (state) for specific point of time.
>>>>>> The event is represented with nested json structure. Top level
>>>>>> elements of the json are table fields with type like text, boolean,
>>>>>> timestamp, list and the nested elements are UDT fields.
>>>>>> Simplified example:
>>>>>> There is a new purchase for the happening, event:
>>>>>> {total_amount: 50, items : [A, B, C, new_item], purchase_time :
>>>>>> '2018-12-27 13:30', specials: null, customer : {... }, fare_amount,...}
>>>>>> I don't know what actually happened for this event, maybe there is
>>>>>> new item purchased, maybe some customer info have been changed, maybe
>>>>>> specials have been revoked and I have to reset them. I just need
to store
>>>>>> the state as it artived from Kafka, there might already be an event
>>>>>> this happening saved before, or maybe this is the first one.
>>>>>> BR,
>>>>>> Tomas
>>>>>> On Thu, 27 Dec 2018, 9:36 pm Eric Stevens < wrote:
>>>>>>> Depending on the use case, creating separate prepared statements
>>>>>>> each combination of set / unset values in large INSERT/UPDATE
>>>>>>> may be prohibitive.
>>>>>>> Instead, you can look into driver level support for UNSET values.
>>>>>>> Requires Cassandra 2.2 or later IIRC.
>>>>>>> See:
>>>>>>> Java Driver:
>>>>>>> Python Driver:
>>>>>>> Node Driver:
>>>>>>> On Thu, Dec 27, 2018 at 3:21 PM Durity, Sean R <
>>>>>>>> wrote:
>>>>>>>> You say the events are incremental updates. I am interpreting
>>>>>>>> to mean only some columns are updated. Others should keep
their original
>>>>>>>> values.
>>>>>>>> You are correct that inserting null creates a tombstone.
>>>>>>>> Can you only insert the columns that actually have new values?
>>>>>>>> skip the columns with no information. (Make the insert generator
a bit
>>>>>>>> smarter.)
>>>>>>>> Create table happening (id text primary key, event text,
a text, b
>>>>>>>> text, c text);
>>>>>>>> Insert into table happening (id, event, a, b, c) values
>>>>>>>> ("MainEvent","The most complete info we have right now","Priceless","10
>>>>>>>> pm","Grand Ballroom");
>>>>>>>> -- b changes
>>>>>>>> Insert into happening (id, b) values ("MainEvent","9:30 pm");
>>>>>>>> Sean Durity
>>>>>>>> -----Original Message-----
>>>>>>>> From: Tomas Bartalos <>
>>>>>>>> Sent: Thursday, December 27, 2018 9:27 AM
>>>>>>>> To:
>>>>>>>> Subject: [EXTERNAL] Howto avoid tombstones when inserting
>>>>>>>> values
>>>>>>>> Hello,
>>>>>>>> I’d start with describing my use case and how I’d like
to use
>>>>>>>> Cassandra to solve my storage needs.
>>>>>>>> We're processing a stream of events for various happenings.
>>>>>>>> event have a unique happening_id.
>>>>>>>> One happening may have many events, usually ~ 20-100 events.
>>>>>>>> like to store only the latest event for the same happening
(Event is an
>>>>>>>> incremental update and it contains all up-to date data about
>>>>>>>> Technically the events are streamed from Kafka, processed
>>>>>>>> Spark an saved to Cassandra.
>>>>>>>> In Cassandra we use upserts (insert with same primary key).
 So far
>>>>>>>> so good, however there comes the tombstone...
>>>>>>>> When I’m inserting field with NULL value, Cassandra creates
>>>>>>>> tombstone for this field. As I understood this is due to
space efficiency,
>>>>>>>> Cassandra doesn’t have to remember there is a NULL value,
she just deletes
>>>>>>>> the respective column and a delete creates a ... tombstone.
>>>>>>>> I was hoping there could be an option to tell Cassandra not
to be
>>>>>>>> so space effective and store “unset" info without generating
>>>>>>>> Something similar to inserting empty strings instead of null
>>>>>>>> CREATE TABLE happening (id text PRIMARY KEY, event text);
>>>>>>>> into happening (‘1’, ‘event1’); — tombstone is
generated insert into
>>>>>>>> happening (‘1’, null); — tombstone is not generated
insert into happening
>>>>>>>> (‘1’, '’);
>>>>>>>> Possible solutions:
>>>>>>>> 1. Disable tombstones with gc_grace_seconds = 0 or set to
>>>>>>>> reasonable low value (1 hour ?) . Not good, since phantom
data may
>>>>>>>> re-appear 2. ignore NULLs on spark side with
>>>>>>>> “spark.cassandra.output.ignoreNulls=true”. Not good since
this will never
>>>>>>>> overwrite previously inserted event field with “empty”
>>>>>>>> 3. On inserts with spark, find all NULL values and replace
>>>>>>>> with “empty” equivalent (empty string for text, 0 for
integer). Very
>>>>>>>> inefficient and problematic to find “empty” equivalent
for some data types.
>>>>>>>> Until tombstones appeared Cassandra was the right fit for
our use
>>>>>>>> case, however now I’m not sure if we’re heading the right
>>>>>>>> Could you please give me some advice how to solve this problem
>>>>>>>> Thank you,
>>>>>>>> Tomas
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail:
>>>>>>>> For additional commands, e-mail:
>>>>>>>> ________________________________
>>>>>>>> The information in this Internet Email is confidential and
may be
>>>>>>>> legally privileged. It is intended solely for the addressee.
Access to this
>>>>>>>> Email by anyone else is unauthorized. If you are not the
>>>>>>>> recipient, any disclosure, copying, distribution or any action
taken or
>>>>>>>> omitted to be taken in reliance on it, is prohibited and
may be unlawful.
>>>>>>>> When addressed to our clients any opinions or advice contained
in this
>>>>>>>> Email are subject to the terms and conditions expressed in
any applicable
>>>>>>>> governing The Home Depot terms of business or client engagement
letter. The
>>>>>>>> Home Depot disclaims all responsibility and liability for
the accuracy and
>>>>>>>> content of this attachment and for any damages or losses
arising from any
>>>>>>>> inaccuracies, errors, viruses, e.g., worms, trojan horses,
etc., or other
>>>>>>>> items of a destructive nature, which may be contained in
this attachment
>>>>>>>> and shall not be liable for direct, indirect, consequential
or special
>>>>>>>> damages in connection with this e-mail message or its attachment.
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail:
>>>>>>>> For additional commands, e-mail:
>>> --
>>> Jon Haddad
>>> twitter: rustyrazorblade
> --
> Jon Haddad
> twitter: rustyrazorblade

View raw message