From user-return-62980-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org Thu Jan 10 01:56:25 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id CBF77180669 for ; Thu, 10 Jan 2019 01:56:23 +0100 (CET) Received: (qmail 31619 invoked by uid 500); 10 Jan 2019 00:56:22 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 31608 invoked by uid 99); 10 Jan 2019 00:56:22 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Jan 2019 00:56:21 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 91D13C036D for ; Thu, 10 Jan 2019 00:56:21 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.859 X-Spam-Level: * X-Spam-Status: No, score=1.859 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_MED=-0.142, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_RED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=jonhaddad-com.20150623.gappssmtp.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id IErzh5A8xkEn for ; Thu, 10 Jan 2019 00:56:18 +0000 (UTC) Received: from mail-it1-f169.google.com (mail-it1-f169.google.com [209.85.166.169]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id EE6035FB0C for ; Thu, 10 Jan 2019 00:56:17 +0000 (UTC) Received: by mail-it1-f169.google.com with SMTP id g76so14701927itg.2 for ; Wed, 09 Jan 2019 16:56:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=jonhaddad-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=N1jl0ruVe515SymMzV6wioeZmuiQL0b3hH596w1vGB4=; b=dxyMea5bq+ncHliF6yJG04yXpcpFt6T682W6iqwoVBj+wWR+EtPXNBiobrT24tC5Wy TFkQV9RUVUPWHlKo2gYXnYPDZq4bMx+QUbYVcGSvFv5vickAZcaB909WjZVv2BBLTJz0 uhRSudQCdYrhGNZ838PNIJGIOjpM8VyycDwFHCnvIIi+BL4fMV5n2g30B48Ni7h26pA/ b5R0Vp49M5RzFoQFZN8ggV/N/dH3h38slj2tCSyO1oeQPhX+MeLIqyIeax6307Fs9VlT k1moLFWNWLs5PVXRFFxJAac45d2MsW8D4Jz4df6IYprSkYjKXfruwlD+xOkBa0qTeOA+ hbNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=N1jl0ruVe515SymMzV6wioeZmuiQL0b3hH596w1vGB4=; b=nOBDinjI4fnFPwirKw/wT14+bSKFfBNM9Psw+S8CylVUi8Jye1g439Wjp1Wsr8vLgc a8Si5xbZdjZDazetCpoNn5mRrJAOr7k+hwl+E8Ow0pmYPpsukM1CrOtf0abU6fpt7pbn S2Q93sjfSANSiSsUsyoJPBNi5UyARy9/9JszLqpyURPUzb455IXxEbCLcCS4wtOlKg5r 51+UztX2yyMbBkHRWteySodjovuJnmMHa6vihCW8984v15pyaH9VS1TQisXJtAm7GiP8 TViMUpfiNhXl5qFaWxBgM2N5LGXVHEWQw3prL0C5/mP/yEMsL5M9lTvrDUvgwgdZIDlx Un0g== X-Gm-Message-State: AJcUukeSWH7UC1PMlXKYAYEH1zB7k+nE2gL0ZHNYMt39S3YWaanjEght U2MiIdibYimFM1Xj+KgRZpqiH2iVFMav1Mo+QqN4odwp X-Google-Smtp-Source: ALg8bN7gSuSehM4jZv0gaHOTZCSPQua3PdB/psbA82MKhYuKAUB7QhstWHSVP+uMI4YM7Ef+VeGvnZbNoRPadcXonhM= X-Received: by 2002:a02:9d27:: with SMTP id n36mr5304487jak.30.1547081770762; Wed, 09 Jan 2019 16:56:10 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Jonathan Haddad Date: Wed, 9 Jan 2019 16:55:59 -0800 Message-ID: Subject: Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values To: user Content-Type: multipart/alternative; boundary="000000000000628494057f100b59" --000000000000628494057f100b59 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable > I=E2=80=99m still not sure if having tombstones vs. empty values / frozen= UDTs will have the same results. When in doubt, benchmark. Good luck, Jon On Wed, Jan 9, 2019 at 3:02 PM Tomas Bartalos wrote: > Loosing atomic updates is a good point, but in my use case its not a > problem, since I always overwrite the whole record (no partitial updates)= . > > I=E2=80=99m still not sure if having tombstones vs. empty values / frozen= UDTs > will have the same results. > When I update one row with 10 null columns it will create 10 tombstones. > We do OLAP processing of data stored in Cassandra with Spark. > > When Spark requests range of data, lets say 1000 rows, I can easily hit > the 10 000 tombstones threshold. > > Even if I would not hit the error threshold Spark requests would increase > the heap pressure, because tombstones have to be collected and returned t= o > coordinator. > > Are my assumptions correct ? > > On 4 Jan 2019, at 21:15, DuyHai Doan wrote: > > The idea of storing your data as a single blob can be dangerous. > > Indeed, you loose the ability to perform atomic update on each column. > > In Cassandra, LWW is the rule. Suppose 2 concurrent updates on the same > row, 1st update changes column Firstname (let's say it's a Person record) > and 2nd update changes column Lastname > > Now depending on the timestamp between the 2 updates, you'll have: > > - old Firstname, new Lastname > - new Firstname, old Lastname > > having updates on columns atomically guarantees you to have new Firstname= , > new Lastname > > On Fri, Jan 4, 2019 at 8:17 PM Jonathan Haddad wrote: > >> Those are two different cases though. It *sounds like* (again, I may be >> missing the point) you're trying to overwrite a value with another value= . >> You're either going to serialize a blob and overwrite a single cell, or >> you're going to overwrite all the cells and include a tombstone. >> >> When you do a read, reading a single tombstone vs a single vs is >> essentially the same thing, performance wise. >> >> In your description you said "~ 20-100 events", and you're overwriting >> the event each time, so I don't know how you go to 10K tombstones either= . >> Compaction will bring multiple tombstones together for a cell in the sam= e >> way it compacts multiple values for a single cell. >> >> I sounds to make like you're taking some advice about tombstones out of >> context and trying to apply the advice to a different problem. Again, I >> might be misunderstanding what you're doing. >> >> >> On Fri, Jan 4, 2019 at 10:49 AM Tomas Bartalos >> wrote: >> >>> Hello Jon, >>> >>> I thought having tombstones is much higher overhead than just >>> overwriting values. The compaction overhead can be l similar, but I thi= nk >>> the read performance is much worse. >>> >>> Tombstones accumulate and hang for 10 days (by default) before they are >>> eligible for compaction. >>> >>> Also we have tombstone warning and error thresholds. If cassandra scans >>> more than 10 000 tombstones, she will abort the query. >>> >>> According to this article: >>> https://opencredo.com/blogs/cassandra-tombstones-common-issues/ >>> >>> "The cassandra.yaml comments explain in perfectly: *=E2=80=9CWhen execu= ting a >>> scan, within or across a partition, we need to keep the tombstones seen= in >>> memory so we can return them to the coordinator, which will use them to >>> make sure other replicas also know about the deleted rows. With workloa= ds >>> that generate a lot of tombstones, this can cause performance problems = and >>> even exhaust the server heap. "* >>> >>> Regards, >>> Tomas >>> >>> On Fri, 4 Jan 2019, 7:06 pm Jonathan Haddad >> >>>> If you're overwriting values, it really doesn't matter much if it's a >>>> tombstone or any other value, they still need to be compacted and have= the >>>> same overhead at read time. >>>> >>>> Tombstones are problematic when you try to use Cassandra as a queue (o= r >>>> something like a queue) and you need to scan over thousands of tombsto= nes >>>> in order to get to the real data. You're simply overwriting a row and >>>> trying to avoid a single tombstone. >>>> >>>> Maybe I'm missing something here. Why do you think overwriting a >>>> single cell with a tombstone is any worse than overwriting a single ce= ll >>>> with a value? >>>> >>>> Jon >>>> >>>> >>>> On Fri, Jan 4, 2019 at 9:57 AM Tomas Bartalos >>>> wrote: >>>> >>>>> Hello, >>>>> >>>>> I beleive your approach is the same as using spark with " >>>>> spark.cassandra.output.ignoreNulls=3Dtrue" >>>>> This will not cover the situation when a value have to be overwriten >>>>> with null. >>>>> >>>>> I found one possible solution - change the schema to keep only primar= y >>>>> key fields and move all other fields to frozen UDT. >>>>> create table (year, month, day, id, frozen, primary key((year, >>>>> month, day), id) ) >>>>> In this way anything that is null inside event doesn't create >>>>> tombstone, since event is serialized to BLOB. >>>>> The penalty is in need of deserializing the whole Event when selectin= g >>>>> only few columns. >>>>> Can anyone confirm if this is good solution performance wise? >>>>> >>>>> Thank you, >>>>> >>>>> On Fri, 4 Jan 2019, 2:20 pm DuyHai Doan >>>> >>>>>> "The problem is I can't know the combination of set/unset values" --= > >>>>>> Just for this requirement, Achilles has a working solution for many = years >>>>>> using INSERT_NOT_NULL_FIELDS strategy: >>>>>> >>>>>> https://github.com/doanduyhai/Achilles/wiki/Insert-Strategy >>>>>> >>>>>> Or you can use the Update API that by design only perform update on >>>>>> not null fields: >>>>>> https://github.com/doanduyhai/Achilles/wiki/Quick-Reference#updating= -all-non-null-fields-for-an-entity >>>>>> >>>>>> >>>>>> Behind the scene, for each new combination of INSERT INTO >>>>>> table(x,y,z) statement, Achilles will check its prepared statement c= ache >>>>>> and if the statement does not exist yet, create a new prepared state= ment >>>>>> and put it into the cache for later re-use for you >>>>>> >>>>>> Disclaiment: I'm the creator of Achilles >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Dec 27, 2018 at 10:21 PM Tomas Bartalos < >>>>>> tomas.bartalos@gmail.com> wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> The problem is I can't know the combination of set/unset values. >>>>>>> From my perspective every value should be set. The event from Kafka >>>>>>> represents the complete state of the happening at certain point in = time. In >>>>>>> my table I want to store the latest event so the most recent state = of the >>>>>>> happening (in this table I don't care about the history). Actually = I used >>>>>>> wrong expression since its just the opposite of "incremental update= ", every >>>>>>> event carries all data (state) for specific point of time. >>>>>>> >>>>>>> The event is represented with nested json structure. Top level >>>>>>> elements of the json are table fields with type like text, boolean, >>>>>>> timestamp, list and the nested elements are UDT fields. >>>>>>> >>>>>>> Simplified example: >>>>>>> There is a new purchase for the happening, event: >>>>>>> {total_amount: 50, items : [A, B, C, new_item], purchase_time : >>>>>>> '2018-12-27 13:30', specials: null, customer : {... }, fare_amount,= ...} >>>>>>> I don't know what actually happened for this event, maybe there is = a >>>>>>> new item purchased, maybe some customer info have been changed, may= be the >>>>>>> specials have been revoked and I have to reset them. I just need to= store >>>>>>> the state as it artived from Kafka, there might already be an event= for >>>>>>> this happening saved before, or maybe this is the first one. >>>>>>> >>>>>>> BR, >>>>>>> Tomas >>>>>>> >>>>>>> >>>>>>> On Thu, 27 Dec 2018, 9:36 pm Eric Stevens >>>>>> >>>>>>>> Depending on the use case, creating separate prepared statements >>>>>>>> for each combination of set / unset values in large INSERT/UPDATE >>>>>>>> statements may be prohibitive. >>>>>>>> >>>>>>>> Instead, you can look into driver level support for UNSET values. >>>>>>>> Requires Cassandra 2.2 or later IIRC. >>>>>>>> >>>>>>>> See: >>>>>>>> Java Driver: >>>>>>>> https://docs.datastax.com/en/developer/java-driver/3.0/manual/stat= ements/prepared/#parameters-and-binding >>>>>>>> Python Driver: >>>>>>>> https://www.datastax.com/dev/blog/python-driver-2-6-0-rc1-with-cas= sandra-2-2-features#distinguishing_between_null_and_unset_values >>>>>>>> Node Driver: >>>>>>>> https://docs.datastax.com/en/developer/nodejs-driver/3.5/features/= datatypes/nulls/#unset >>>>>>>> >>>>>>>> On Thu, Dec 27, 2018 at 3:21 PM Durity, Sean R < >>>>>>>> SEAN_R_DURITY@homedepot.com> wrote: >>>>>>>> >>>>>>>>> You say the events are incremental updates. I am interpreting thi= s >>>>>>>>> to mean only some columns are updated. Others should keep their o= riginal >>>>>>>>> values. >>>>>>>>> >>>>>>>>> You are correct that inserting null creates a tombstone. >>>>>>>>> >>>>>>>>> Can you only insert the columns that actually have new values? >>>>>>>>> Just skip the columns with no information. (Make the insert gener= ator a bit >>>>>>>>> smarter.) >>>>>>>>> >>>>>>>>> Create table happening (id text primary key, event text, a text, = b >>>>>>>>> text, c text); >>>>>>>>> Insert into table happening (id, event, a, b, c) values >>>>>>>>> ("MainEvent","The most complete info we have right now","Priceles= s","10 >>>>>>>>> pm","Grand Ballroom"); >>>>>>>>> -- b changes >>>>>>>>> Insert into happening (id, b) values ("MainEvent","9:30 pm"); >>>>>>>>> >>>>>>>>> >>>>>>>>> Sean Durity >>>>>>>>> >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: Tomas Bartalos >>>>>>>>> Sent: Thursday, December 27, 2018 9:27 AM >>>>>>>>> To: user@cassandra.apache.org >>>>>>>>> Subject: [EXTERNAL] Howto avoid tombstones when inserting NULL >>>>>>>>> values >>>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> I=E2=80=99d start with describing my use case and how I=E2=80=99d= like to use >>>>>>>>> Cassandra to solve my storage needs. >>>>>>>>> We're processing a stream of events for various happenings. Every >>>>>>>>> event have a unique happening_id. >>>>>>>>> One happening may have many events, usually ~ 20-100 events. I=E2= =80=99d >>>>>>>>> like to store only the latest event for the same happening (Event= is an >>>>>>>>> incremental update and it contains all up-to date data about happ= ening). >>>>>>>>> Technically the events are streamed from Kafka, processed with >>>>>>>>> Spark an saved to Cassandra. >>>>>>>>> In Cassandra we use upserts (insert with same primary key). So >>>>>>>>> far so good, however there comes the tombstone... >>>>>>>>> >>>>>>>>> When I=E2=80=99m inserting field with NULL value, Cassandra creat= es >>>>>>>>> tombstone for this field. As I understood this is due to space ef= ficiency, >>>>>>>>> Cassandra doesn=E2=80=99t have to remember there is a NULL value,= she just deletes >>>>>>>>> the respective column and a delete creates a ... tombstone. >>>>>>>>> I was hoping there could be an option to tell Cassandra not to be >>>>>>>>> so space effective and store =E2=80=9Cunset" info without generat= ing tombstones. >>>>>>>>> Something similar to inserting empty strings instead of null >>>>>>>>> values: >>>>>>>>> >>>>>>>>> CREATE TABLE happening (id text PRIMARY KEY, event text); insert >>>>>>>>> into happening (=E2=80=981=E2=80=99, =E2=80=98event1=E2=80=99); = =E2=80=94 tombstone is generated insert into >>>>>>>>> happening (=E2=80=981=E2=80=99, null); =E2=80=94 tombstone is not= generated insert into happening >>>>>>>>> (=E2=80=981=E2=80=99, '=E2=80=99); >>>>>>>>> >>>>>>>>> Possible solutions: >>>>>>>>> 1. Disable tombstones with gc_grace_seconds =3D 0 or set to >>>>>>>>> reasonable low value (1 hour ?) . Not good, since phantom data ma= y >>>>>>>>> re-appear 2. ignore NULLs on spark side with >>>>>>>>> =E2=80=9Cspark.cassandra.output.ignoreNulls=3Dtrue=E2=80=9D. Not = good since this will never >>>>>>>>> overwrite previously inserted event field with =E2=80=9Cempty=E2= =80=9D one. >>>>>>>>> 3. On inserts with spark, find all NULL values and replace them >>>>>>>>> with =E2=80=9Cempty=E2=80=9D equivalent (empty string for text, 0= for integer). Very >>>>>>>>> inefficient and problematic to find =E2=80=9Cempty=E2=80=9D equiv= alent for some data types. >>>>>>>>> >>>>>>>>> Until tombstones appeared Cassandra was the right fit for our use >>>>>>>>> case, however now I=E2=80=99m not sure if we=E2=80=99re heading t= he right direction. >>>>>>>>> Could you please give me some advice how to solve this problem ? >>>>>>>>> >>>>>>>>> Thank you, >>>>>>>>> Tomas >>>>>>>>> >>>>>>>>> -----------------------------------------------------------------= ---- >>>>>>>>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org >>>>>>>>> For additional commands, e-mail: user-help@cassandra.apache.org >>>>>>>>> >>>>>>>>> >>>>>>>>> ________________________________ >>>>>>>>> >>>>>>>>> The information in this Internet Email is confidential and may be >>>>>>>>> legally privileged. It is intended solely for the addressee. Acce= ss to this >>>>>>>>> Email by anyone else is unauthorized. If you are not the intended >>>>>>>>> recipient, any disclosure, copying, distribution or any action ta= ken or >>>>>>>>> omitted to be taken in reliance on it, is prohibited and may be u= nlawful. >>>>>>>>> When addressed to our clients any opinions or advice contained in= this >>>>>>>>> Email are subject to the terms and conditions expressed in any ap= plicable >>>>>>>>> governing The Home Depot terms of business or client engagement l= etter. The >>>>>>>>> Home Depot disclaims all responsibility and liability for the acc= uracy and >>>>>>>>> content of this attachment and for any damages or losses arising = from any >>>>>>>>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., = or other >>>>>>>>> items of a destructive nature, which may be contained in this att= achment >>>>>>>>> and shall not be liable for direct, indirect, consequential or sp= ecial >>>>>>>>> damages in connection with this e-mail message or its attachment. >>>>>>>>> >>>>>>>>> >>>>>>>>> -----------------------------------------------------------------= ---- >>>>>>>>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org >>>>>>>>> For additional commands, e-mail: user-help@cassandra.apache.org >>>>>>>>> >>>>>>>> >>>> >>>> -- >>>> Jon Haddad >>>> http://www.rustyrazorblade.com >>>> twitter: rustyrazorblade >>>> >>> >> >> -- >> Jon Haddad >> http://www.rustyrazorblade.com >> twitter: rustyrazorblade >> > > --=20 Jon Haddad http://www.rustyrazorblade.com twitter: rustyrazorblade --000000000000628494057f100b59 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
> I=E2=80=99m still not sure if having tombstones vs. e= mpty values / frozen UDTs will have the same results.

When in d= oubt, benchmark.=C2=A0=C2=A0

Good luck,
= Jon

On Wed, Jan = 9, 2019 at 3:02 PM Tomas Bartalos <tomas.bartalos@gmail.com> wrote:
= Loosing atomic updates is a good point, but in my use case its not a proble= m, since I always overwrite the whole record (no partitial updates).
I=E2=80=99m still not sure if having tombstones vs. empty valu= es / frozen UDTs will have the same results.
When I update one ro= w with 10 null columns it will create 10 tombstones.
We do OLAP p= rocessing of data stored in Cassandra with Spark.

= When Spark requests range of data, lets say 1000 rows, I can easily hit the= 10 000 tombstones threshold.

Even if I would not = hit the error threshold Spark requests would increase the heap pressure, be= cause tombstones have to be collected and returned to coordinator.=C2=A0

Are my assumptions correct ?

On 4 Jan 2019, at 21:15, DuyHai Doan <doanduyhai@gmail.com= > wrote:

The idea of storing your data as a single b= lob can be dangerous.

Indeed, you loose the ability to p= erform atomic update on each column.

In Cassandra, LWW is the rule. = Suppose 2 concurrent updates on the same row, 1st update changes column Fir= stname (let's say it's a Person record) and 2nd update changes colu= mn Lastname

Now depending on the timestamp between the 2 updates, yo= u'll have:

- old Firstname, new Lastname
=
- new Firstname, old Lastname

having updates = on columns atomically guarantees you to have new Firstname, new Lastname

On Fri, Jan 4, 201= 9 at 8:17 PM Jonathan Haddad <jon@jonhaddad.com> wrote:
Those are two different= cases though.=C2=A0 It *sounds like* (again, I may be missing the point) y= ou're trying to overwrite a value with another value.=C2=A0 You're = either going to serialize a blob and overwrite a single cell, or you're= going to overwrite all the cells and include a tombstone.

When you do a read, reading a single tombstone vs a single vs is e= ssentially the same thing, performance wise.=C2=A0=C2=A0

In your description you said "~ 20-100 events", and you= 9;re overwriting the event each time, so I don't know how you go to 10K= tombstones either.=C2=A0 Compaction will bring multiple tombstones togethe= r for a cell in the same way it compacts multiple values for a single cell.= =C2=A0=C2=A0

I sounds to make like you're taki= ng some advice about tombstones out of context and trying to apply the advi= ce to a different problem.=C2=A0 Again, I might be misunderstanding what yo= u're doing.


On Fri, Jan 4, 2019 at 10:49 AM Tomas Bartalos <tomas.bartalos@gmail= .com> wrote:
Hello Jon,=C2=A0

I thought having tombstones is much higher overhead tha= n just overwriting values. The compaction overhead can be l similar, but I = think the read performance is much worse.

=
Tombstones accumulate and hang for 10 days (by default) b= efore they are eligible for compaction.=C2=A0

Also we have tombstone= warning and error thresholds. If cassandra scans more than 10 000 tombston= es, she will abort the query.

"T= he=C2=A0cassandra.yaml=C2=A0comments explain in perfectly:=C2=A0=E2=80=9CWhen executing a sca= n, within or across a partition, we need to keep the tombstones seen in mem= ory so we can return them to the coordinator, which will use them to make s= ure other replicas also know about the deleted rows. With workloads that ge= nerate a lot of tombstones, this can cause performance problems and even ex= haust the server heap. "

Regards,=C2=A0
Tomas

On Fri, 4 Jan 2019, = 7:06 pm Jonathan Haddad <jon@jonhaddad.com wrote:
If you're overwriting values, it = really doesn't=C2=A0matter much if it's a tombstone or any other va= lue, they still need to be compacted and have the same overhead at read tim= e.=C2=A0=C2=A0

Tombstones are problematic when you try t= o use Cassandra as a queue (or something like a queue) and you need to scan= over thousands of tombstones in order to get to the real data.=C2=A0 You&#= 39;re simply overwriting a row and trying to avoid a single tombstone.=C2= =A0=C2=A0

Maybe I'm missing something here.=C2= =A0 Why do you think overwriting a single cell with a tombstone is any wors= e than overwriting a single cell with a value?

Jon=


= On Fri, Jan 4, 2019 at 9:57 AM Tomas Bartalos <tomas.bartalos@gmai= l.com> wrote:
Hello,

= I beleive your approach is the same as using spark=C2=A0with=C2=A0"spark.cassandra.output.ignoreNulls=3Dtrue"
This will not cover the situation = when a value have to be overwriten with null.=C2=A0

I found one possible solution - change the schema to kee= p only primary key fields and move all other fields to frozen UDT.
create table (year, month, = day, id, frozen<Event>, primary key((year, month, day), id) )<= /div>
In this way anything that = is null inside event doesn't create tombstone, since event is serialize= d to BLOB.
The pena= lty is in need of deserializing the whole Event when selecting only few col= umns.=C2=A0
Can any= one confirm if this is good solution performance wise?

Thank you,=C2=A0

On Fri, 4 Jan 2019, 2:20 pm DuyHai Doan <<= a href=3D"mailto:doanduyhai@gmail.com" rel=3D"noreferrer" target=3D"_blank"= >doanduyhai@gmail.com wrote:
"T= he problem is I can't know the combination of set/unset values" --= > Just for this requirement, Achilles has a working solution for many ye= ars using INSERT_NOT_NULL_FIELDS strategy:


Or you can use the Update API= that by design only perform update on not null fields:=C2=A0h= ttps://github.com/doanduyhai/Achilles/wiki/Quick-Reference#updating-all-non= -null-fields-for-an-entity


Behi= nd the scene, for each new combination of INSERT INTO table(x,y,z) statemen= t, Achilles will check its prepared statement cache and if the statement do= es not exist yet, create a new prepared statement and put it into the cache= for later re-use for you

Disclaiment: I'm the= creator of Achilles



On Thu, Dec 27, 2018 at = 10:21 PM Tomas Bartalos <tomas.bartalos@gmail.com&g= t; wrote:
Hello,

= The problem is I can't know the combination of set/unset values. From m= y perspective every value should be set. The event from Kafka represents th= e complete state of the happening at certain point in time. In my table I w= ant to store the latest event so the most recent state of the happening (in= this table I don't care about the history). Actually I used wrong expr= ession since its just the opposite of "incremental update", every= event carries all data (state) for specific point of time.=C2=A0

The event is represented with nes= ted json structure. Top level elements of the json are table fields with ty= pe like text, boolean, timestamp, list and the nested elements are UDT fiel= ds.=C2=A0

Simplified exa= mple:
There is a new purchase for the happening, eve= nt:
{total_amount: 50, items : [A, B, C, new_item], = purchase_time : '2018-12-27 13:30', specials: null, customer : {...= }, fare_amount,...}=C2=A0
I don't know what act= ually happened for this event, maybe there is a new item purchased, maybe s= ome customer info have been changed, maybe the specials have been revoked a= nd I have to reset them. I just need to store the state as it artived from = Kafka, there might already be an event for this happening saved before, or = maybe this is the first one.

BR,
Tomas


On Thu, 27 Dec 2018, 9:3= 6 pm Eric Stevens <mightye@gmail.com wrote:
Depending on the use case, creating s= eparate prepared statements for each combination of set / unset values in l= arge INSERT/UPDATE statements may be prohibitive.=C2=A0=C2=A0


On Thu, Dec 27, 2018 at 3:21 PM Durity, Sean R <SEAN_R_DURITY@homedepot.com> wrote:
You say the events are increment= al updates. I am interpreting this to mean only some columns are updated. O= thers should keep their original values.

You are correct that inserting null creates a tombstone.

Can you only insert the columns that actually have new values? Just skip th= e columns with no information. (Make the insert generator a bit smarter.)
Create table happening (id text primary key, event text, a text, b text, c = text);
Insert into table happening (id, event, a, b, c) values ("MainEvent&qu= ot;,"The most complete info we have right now","Priceless&qu= ot;,"10 pm","Grand Ballroom");
-- b changes
Insert into happening (id, b) values ("MainEvent","9:30 pm&q= uot;);


Sean Durity


-----Original Message-----
From: Tomas Bartalos <tomas.bartalos@gmail.c= om>
Sent: Thursday, December 27, 2018 9:27 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Howto avoid tombstones when inserting NULL values

Hello,

I=E2=80=99d start with describing my use case and how I=E2=80=99d like to u= se Cassandra to solve my storage needs.
We're processing a stream of events for various happenings. Every event= have a unique happening_id.
One happening may have many events, usually ~ 20-100 events. I=E2=80=99d li= ke to store only the latest event for the same happening (Event is an incre= mental update and it contains all up-to date data about happening).
Technically the events are streamed from Kafka, processed with Spark an sav= ed to Cassandra.
In Cassandra we use upserts (insert with same primary key).=C2=A0 So far so= good, however there comes the tombstone...

When I=E2=80=99m inserting field with NULL value, Cassandra creates tombsto= ne for this field. As I understood this is due to space efficiency, Cassand= ra doesn=E2=80=99t have to remember there is a NULL value, she just deletes= the respective column and a delete creates a ... tombstone.
I was hoping there could be an option to tell Cassandra not to be so space = effective and store =E2=80=9Cunset" info without generating tombstones= .
Something similar to inserting empty strings instead of null values:

CREATE TABLE happening (id text PRIMARY KEY, event text); insert into happe= ning (=E2=80=981=E2=80=99, =E2=80=98event1=E2=80=99); =E2=80=94 tombstone i= s generated insert into happening (=E2=80=981=E2=80=99, null); =E2=80=94 to= mbstone is not generated insert into happening (=E2=80=981=E2=80=99, '= =E2=80=99);

Possible solutions:
1. Disable tombstones with gc_grace_seconds =3D 0 or set to reasonable low = value (1 hour ?) . Not good, since phantom data may re-appear 2. ignore NUL= Ls on spark side with =E2=80=9Cspark.cassandra.output.ignoreNulls=3Dtrue=E2= =80=9D. Not good since this will never overwrite previously inserted event = field with =E2=80=9Cempty=E2=80=9D one.
3. On inserts with spark, find all NULL values and replace them with =E2=80= =9Cempty=E2=80=9D equivalent (empty string for text, 0 for integer). Very i= nefficient and problematic to find =E2=80=9Cempty=E2=80=9D equivalent for s= ome data types.

Until tombstones appeared Cassandra was the right fit for our use case, how= ever now I=E2=80=99m not sure if we=E2=80=99re heading the right direction.=
Could you please give me some advice how to solve this problem ?

Thank you,
Tomas
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsub= scribe@cassandra.apache.org
For additional commands, e-mail: user-hel= p@cassandra.apache.org


________________________________

The information in this Internet Email is confidential and may be legally p= rivileged. It is intended solely for the addressee. Access to this Email by= anyone else is unauthorized. If you are not the intended recipient, any di= sclosure, copying, distribution or any action taken or omitted to be taken = in reliance on it, is prohibited and may be unlawful. When addressed to our= clients any opinions or advice contained in this Email are subject to the = terms and conditions expressed in any applicable governing The Home Depot t= erms of business or client engagement letter. The Home Depot disclaims all = responsibility and liability for the accuracy and content of this attachmen= t and for any damages or losses arising from any inaccuracies, errors, viru= ses, e.g., worms, trojan horses, etc., or other items of a destructive natu= re, which may be contained in this attachment and shall not be liable for d= irect, indirect, consequential or special damages in connection with this e= -mail message or its attachment.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsub= scribe@cassandra.apache.org
For additional commands, e-mail: user-hel= p@cassandra.apache.org


--
Jon Haddad
= http://www.rustyrazorblade.com
twitter: rustyrazorblade


--
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade



--
Jon Had= dad
http://= www.rustyrazorblade.com
twitter: rustyrazorblade
--000000000000628494057f100b59--