cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Haddad <...@jonhaddad.com>
Subject Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values
Date Thu, 10 Jan 2019 00:55:59 GMT
> I’m still not sure if having tombstones vs. empty values / frozen UDTs
will have the same results.

When in doubt, benchmark.

Good luck,
Jon

On Wed, Jan 9, 2019 at 3:02 PM Tomas Bartalos <tomas.bartalos@gmail.com>
wrote:

> Loosing atomic updates is a good point, but in my use case its not a
> problem, since I always overwrite the whole record (no partitial updates).
>
> I’m still not sure if having tombstones vs. empty values / frozen UDTs
> will have the same results.
> When I update one row with 10 null columns it will create 10 tombstones.
> We do OLAP processing of data stored in Cassandra with Spark.
>
> When Spark requests range of data, lets say 1000 rows, I can easily hit
> the 10 000 tombstones threshold.
>
> Even if I would not hit the error threshold Spark requests would increase
> the heap pressure, because tombstones have to be collected and returned to
> coordinator.
>
> Are my assumptions correct ?
>
> On 4 Jan 2019, at 21:15, DuyHai Doan <doanduyhai@gmail.com> wrote:
>
> The idea of storing your data as a single blob can be dangerous.
>
> Indeed, you loose the ability to perform atomic update on each column.
>
> In Cassandra, LWW is the rule. Suppose 2 concurrent updates on the same
> row, 1st update changes column Firstname (let's say it's a Person record)
> and 2nd update changes column Lastname
>
> Now depending on the timestamp between the 2 updates, you'll have:
>
> - old Firstname, new Lastname
> - new Firstname, old Lastname
>
> having updates on columns atomically guarantees you to have new Firstname,
> new Lastname
>
> On Fri, Jan 4, 2019 at 8:17 PM Jonathan Haddad <jon@jonhaddad.com> wrote:
>
>> Those are two different cases though.  It *sounds like* (again, I may be
>> missing the point) you're trying to overwrite a value with another value.
>> You're either going to serialize a blob and overwrite a single cell, or
>> you're going to overwrite all the cells and include a tombstone.
>>
>> When you do a read, reading a single tombstone vs a single vs is
>> essentially the same thing, performance wise.
>>
>> In your description you said "~ 20-100 events", and you're overwriting
>> the event each time, so I don't know how you go to 10K tombstones either.
>> Compaction will bring multiple tombstones together for a cell in the same
>> way it compacts multiple values for a single cell.
>>
>> I sounds to make like you're taking some advice about tombstones out of
>> context and trying to apply the advice to a different problem.  Again, I
>> might be misunderstanding what you're doing.
>>
>>
>> On Fri, Jan 4, 2019 at 10:49 AM Tomas Bartalos <tomas.bartalos@gmail.com>
>> wrote:
>>
>>> Hello Jon,
>>>
>>> I thought having tombstones is much higher overhead than just
>>> overwriting values. The compaction overhead can be l similar, but I think
>>> the read performance is much worse.
>>>
>>> Tombstones accumulate and hang for 10 days (by default) before they are
>>> eligible for compaction.
>>>
>>> Also we have tombstone warning and error thresholds. If cassandra scans
>>> more than 10 000 tombstones, she will abort the query.
>>>
>>> According to this article:
>>> https://opencredo.com/blogs/cassandra-tombstones-common-issues/
>>>
>>> "The cassandra.yaml comments explain in perfectly: *“When executing a
>>> scan, within or across a partition, we need to keep the tombstones seen in
>>> memory so we can return them to the coordinator, which will use them to
>>> make sure other replicas also know about the deleted rows. With workloads
>>> that generate a lot of tombstones, this can cause performance problems and
>>> even exhaust the server heap. "*
>>>
>>> Regards,
>>> Tomas
>>>
>>> On Fri, 4 Jan 2019, 7:06 pm Jonathan Haddad <jon@jonhaddad.com wrote:
>>>
>>>> If you're overwriting values, it really doesn't matter much if it's a
>>>> tombstone or any other value, they still need to be compacted and have the
>>>> same overhead at read time.
>>>>
>>>> Tombstones are problematic when you try to use Cassandra as a queue (or
>>>> something like a queue) and you need to scan over thousands of tombstones
>>>> in order to get to the real data.  You're simply overwriting a row and
>>>> trying to avoid a single tombstone.
>>>>
>>>> Maybe I'm missing something here.  Why do you think overwriting a
>>>> single cell with a tombstone is any worse than overwriting a single cell
>>>> with a value?
>>>>
>>>> Jon
>>>>
>>>>
>>>> On Fri, Jan 4, 2019 at 9:57 AM Tomas Bartalos <tomas.bartalos@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I beleive your approach is the same as using spark with "
>>>>> spark.cassandra.output.ignoreNulls=true"
>>>>> This will not cover the situation when a value have to be overwriten
>>>>> with null.
>>>>>
>>>>> I found one possible solution - change the schema to keep only primary
>>>>> key fields and move all other fields to frozen UDT.
>>>>> create table (year, month, day, id, frozen<Event>, primary key((year,
>>>>> month, day), id) )
>>>>> In this way anything that is null inside event doesn't create
>>>>> tombstone, since event is serialized to BLOB.
>>>>> The penalty is in need of deserializing the whole Event when selecting
>>>>> only few columns.
>>>>> Can anyone confirm if this is good solution performance wise?
>>>>>
>>>>> Thank you,
>>>>>
>>>>> On Fri, 4 Jan 2019, 2:20 pm DuyHai Doan <doanduyhai@gmail.com wrote:
>>>>>
>>>>>> "The problem is I can't know the combination of set/unset values"
-->
>>>>>> Just for this requirement, Achilles has a working solution for many
years
>>>>>> using INSERT_NOT_NULL_FIELDS strategy:
>>>>>>
>>>>>> https://github.com/doanduyhai/Achilles/wiki/Insert-Strategy
>>>>>>
>>>>>> Or you can use the Update API that by design only perform update
on
>>>>>> not null fields:
>>>>>> https://github.com/doanduyhai/Achilles/wiki/Quick-Reference#updating-all-non-null-fields-for-an-entity
>>>>>>
>>>>>>
>>>>>> Behind the scene, for each new combination of INSERT INTO
>>>>>> table(x,y,z) statement, Achilles will check its prepared statement
cache
>>>>>> and if the statement does not exist yet, create a new prepared statement
>>>>>> and put it into the cache for later re-use for you
>>>>>>
>>>>>> Disclaiment: I'm the creator of Achilles
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Dec 27, 2018 at 10:21 PM Tomas Bartalos <
>>>>>> tomas.bartalos@gmail.com> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> The problem is I can't know the combination of set/unset values.
>>>>>>> From my perspective every value should be set. The event from
Kafka
>>>>>>> represents the complete state of the happening at certain point
in time. In
>>>>>>> my table I want to store the latest event so the most recent
state of the
>>>>>>> happening (in this table I don't care about the history). Actually
I used
>>>>>>> wrong expression since its just the opposite of "incremental
update", every
>>>>>>> event carries all data (state) for specific point of time.
>>>>>>>
>>>>>>> The event is represented with nested json structure. Top level
>>>>>>> elements of the json are table fields with type like text, boolean,
>>>>>>> timestamp, list and the nested elements are UDT fields.
>>>>>>>
>>>>>>> Simplified example:
>>>>>>> There is a new purchase for the happening, event:
>>>>>>> {total_amount: 50, items : [A, B, C, new_item], purchase_time
:
>>>>>>> '2018-12-27 13:30', specials: null, customer : {... }, fare_amount,...}
>>>>>>> I don't know what actually happened for this event, maybe there
is a
>>>>>>> new item purchased, maybe some customer info have been changed,
maybe the
>>>>>>> specials have been revoked and I have to reset them. I just need
to store
>>>>>>> the state as it artived from Kafka, there might already be an
event for
>>>>>>> this happening saved before, or maybe this is the first one.
>>>>>>>
>>>>>>> BR,
>>>>>>> Tomas
>>>>>>>
>>>>>>>
>>>>>>> On Thu, 27 Dec 2018, 9:36 pm Eric Stevens <mightye@gmail.com
wrote:
>>>>>>>
>>>>>>>> Depending on the use case, creating separate prepared statements
>>>>>>>> for each combination of set / unset values in large INSERT/UPDATE
>>>>>>>> statements may be prohibitive.
>>>>>>>>
>>>>>>>> Instead, you can look into driver level support for UNSET
values.
>>>>>>>> Requires Cassandra 2.2 or later IIRC.
>>>>>>>>
>>>>>>>> See:
>>>>>>>> Java Driver:
>>>>>>>> https://docs.datastax.com/en/developer/java-driver/3.0/manual/statements/prepared/#parameters-and-binding
>>>>>>>> Python Driver:
>>>>>>>> https://www.datastax.com/dev/blog/python-driver-2-6-0-rc1-with-cassandra-2-2-features#distinguishing_between_null_and_unset_values
>>>>>>>> Node Driver:
>>>>>>>> https://docs.datastax.com/en/developer/nodejs-driver/3.5/features/datatypes/nulls/#unset
>>>>>>>>
>>>>>>>> On Thu, Dec 27, 2018 at 3:21 PM Durity, Sean R <
>>>>>>>> SEAN_R_DURITY@homedepot.com> wrote:
>>>>>>>>
>>>>>>>>> You say the events are incremental updates. I am interpreting
this
>>>>>>>>> to mean only some columns are updated. Others should
keep their original
>>>>>>>>> values.
>>>>>>>>>
>>>>>>>>> You are correct that inserting null creates a tombstone.
>>>>>>>>>
>>>>>>>>> Can you only insert the columns that actually have new
values?
>>>>>>>>> Just skip the columns with no information. (Make the
insert generator a bit
>>>>>>>>> smarter.)
>>>>>>>>>
>>>>>>>>> Create table happening (id text primary key, event text,
a text, b
>>>>>>>>> text, c text);
>>>>>>>>> Insert into table happening (id, event, a, b, c) values
>>>>>>>>> ("MainEvent","The most complete info we have right now","Priceless","10
>>>>>>>>> pm","Grand Ballroom");
>>>>>>>>> -- b changes
>>>>>>>>> Insert into happening (id, b) values ("MainEvent","9:30
pm");
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Sean Durity
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Tomas Bartalos <tomas.bartalos@gmail.com>
>>>>>>>>> Sent: Thursday, December 27, 2018 9:27 AM
>>>>>>>>> To: user@cassandra.apache.org
>>>>>>>>> Subject: [EXTERNAL] Howto avoid tombstones when inserting
NULL
>>>>>>>>> values
>>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I’d start with describing my use case and how I’d
like to use
>>>>>>>>> Cassandra to solve my storage needs.
>>>>>>>>> We're processing a stream of events for various happenings.
Every
>>>>>>>>> event have a unique happening_id.
>>>>>>>>> One happening may have many events, usually ~ 20-100
events. I’d
>>>>>>>>> like to store only the latest event for the same happening
(Event is an
>>>>>>>>> incremental update and it contains all up-to date data
about happening).
>>>>>>>>> Technically the events are streamed from Kafka, processed
with
>>>>>>>>> Spark an saved to Cassandra.
>>>>>>>>> In Cassandra we use upserts (insert with same primary
key).  So
>>>>>>>>> far so good, however there comes the tombstone...
>>>>>>>>>
>>>>>>>>> When I’m inserting field with NULL value, Cassandra
creates
>>>>>>>>> tombstone for this field. As I understood this is due
to space efficiency,
>>>>>>>>> Cassandra doesn’t have to remember there is a NULL
value, she just deletes
>>>>>>>>> the respective column and a delete creates a ... tombstone.
>>>>>>>>> I was hoping there could be an option to tell Cassandra
not to be
>>>>>>>>> so space effective and store “unset" info without generating
tombstones.
>>>>>>>>> Something similar to inserting empty strings instead
of null
>>>>>>>>> values:
>>>>>>>>>
>>>>>>>>> CREATE TABLE happening (id text PRIMARY KEY, event text);
insert
>>>>>>>>> into happening (‘1’, ‘event1’); — tombstone
is generated insert into
>>>>>>>>> happening (‘1’, null); — tombstone is not generated
insert into happening
>>>>>>>>> (‘1’, '’);
>>>>>>>>>
>>>>>>>>> Possible solutions:
>>>>>>>>> 1. Disable tombstones with gc_grace_seconds = 0 or set
to
>>>>>>>>> reasonable low value (1 hour ?) . Not good, since phantom
data may
>>>>>>>>> re-appear 2. ignore NULLs on spark side with
>>>>>>>>> “spark.cassandra.output.ignoreNulls=true”. Not good
since this will never
>>>>>>>>> overwrite previously inserted event field with “empty”
one.
>>>>>>>>> 3. On inserts with spark, find all NULL values and replace
them
>>>>>>>>> with “empty” equivalent (empty string for text, 0
for integer). Very
>>>>>>>>> inefficient and problematic to find “empty” equivalent
for some data types.
>>>>>>>>>
>>>>>>>>> Until tombstones appeared Cassandra was the right fit
for our use
>>>>>>>>> case, however now I’m not sure if we’re heading the
right direction.
>>>>>>>>> Could you please give me some advice how to solve this
problem ?
>>>>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>> Tomas
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>>>>>>>>> For additional commands, e-mail: user-help@cassandra.apache.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ________________________________
>>>>>>>>>
>>>>>>>>> The information in this Internet Email is confidential
and may be
>>>>>>>>> legally privileged. It is intended solely for the addressee.
Access to this
>>>>>>>>> Email by anyone else is unauthorized. If you are not
the intended
>>>>>>>>> recipient, any disclosure, copying, distribution or any
action taken or
>>>>>>>>> omitted to be taken in reliance on it, is prohibited
and may be unlawful.
>>>>>>>>> When addressed to our clients any opinions or advice
contained in this
>>>>>>>>> Email are subject to the terms and conditions expressed
in any applicable
>>>>>>>>> governing The Home Depot terms of business or client
engagement letter. The
>>>>>>>>> Home Depot disclaims all responsibility and liability
for the accuracy and
>>>>>>>>> content of this attachment and for any damages or losses
arising from any
>>>>>>>>> inaccuracies, errors, viruses, e.g., worms, trojan horses,
etc., or other
>>>>>>>>> items of a destructive nature, which may be contained
in this attachment
>>>>>>>>> and shall not be liable for direct, indirect, consequential
or special
>>>>>>>>> damages in connection with this e-mail message or its
attachment.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>>>>>>>>> For additional commands, e-mail: user-help@cassandra.apache.org
>>>>>>>>>
>>>>>>>>
>>>>
>>>> --
>>>> Jon Haddad
>>>> http://www.rustyrazorblade.com
>>>> twitter: rustyrazorblade
>>>>
>>>
>>
>> --
>> Jon Haddad
>> http://www.rustyrazorblade.com
>> twitter: rustyrazorblade
>>
>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Mime
View raw message