Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: message received from 54.164.171.186
 which is an MX secondary for user@cassandra.apache.org)
MIME-Version: 1.0
In-Reply-To: 
 <CAEDUwd1X-L0vqySu8bGCv1bbRLcfa5p2JxHcPWJo2RfTiXgdng@mail.gmail.com>
References: <08a04f153c668712b26719ba69b92e2a@mail.gmail.com>
 <720FB8FD-93B6-44B0-8994-1B6F23DFCC58@fold3.com>
 <43ed4bfb16d497b8f81d317b79b94355@mail.gmail.com>
 <CAORswtxY6bzD2G4Re8nb_ee5VMp=nJf3NYMe4cwfz1ZNDpwCjA@mail.gmail.com>
 <CAEDUwd1X-L0vqySu8bGCv1bbRLcfa5p2JxHcPWJo2RfTiXgdng@mail.gmail.com>
From: Eric Stevens <mightye@gmail.com>
Date: Wed, 6 May 2015 06:37:27 -0700
Message-ID: 
 <CAORswtxHqod5Y-CXOPaqLmbDVhCBhMn+Me9W3jBb2bueJrE7kw@mail.gmail.com>
Subject: Re: Inserting null values
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Content-Type: multipart/alternative; boundary=001a1149017c9a1e51051569e648

--001a1149017c9a1e51051569e648
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

I agree that inserting null is not as good as not inserting that column at
all when you have confidence that you are not shadowing any underlying
data. But pragmatically speaking it really doesn't sound like a small
number of incidental nulls/tombstones (< 20% of columns, otherwise
CASSANDRA-3442 takes over) is going to have any performance impact either
in your query patterns or in compaction in any practical sense.

If INSERT of null values is problematic for small portions of your data,
then it stands to reason that an INSERT option containing an instruction to
prevent tombstone creation would be an important performance optimization
(and would also address the fact that non-null collections also generate
tombstones on INSERT as well).  INSERT INTO ... USING no_tombstones;


> There's thresholds (log messages, etc.) which operate on tombstone counts
over a certain number, but not on column counts over the same number.

tombstone_warn_threshold and tombstone_failure_threshold only apply to
clustering scans right?  I.E. tombstones don't count against those
thresholds if they are not part of the clustering key column being
considered for the non-EQ relation?  The documentation certainly implies so=
:

tombstone_warn_threshold=C2=B6
<http://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configCa=
ssandra_yaml_r.html?scroll=3Dreference_ds_qfg_n1r_1k__tombstone_warn_thresh=
old>
(Default: 1000) The maximum number of tombstones a query can scan before
warning.tombstone_failure_threshold=C2=B6
<http://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configCa=
ssandra_yaml_r.html?scroll=3Dreference_ds_qfg_n1r_1k__tombstone_failure_thr=
eshold>
(Default: 100000) The maximum number of tombstones a query can scan before
aborting.

On Wed, Apr 29, 2015 at 12:42 PM, Robert Coli <rcoli@eventbrite.com> wrote:

> On Wed, Apr 29, 2015 at 9:16 AM, Eric Stevens <mightye@gmail.com> wrote:
>
>> In the end, inserting a tombstone into a non-clustered column shouldn't
>> be appreciably worse (if it is at all) than inserting a value instead.  =
Or
>> am I missing something here?
>>
>
> There's thresholds (log messages, etc.) which operate on tombstone counts
> over a certain number, but not on column counts over the same number.
>
> Given that tombstones are often smaller than data columns, sorta hard to
> understand conceptually?
>
> =3DRob
>
>

--001a1149017c9a1e51051569e648
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>I agree that inserting null is not as good as not ins=
erting that column at all when you have confidence that you are not shadowi=
ng any underlying data. But pragmatically speaking it really doesn&#39;t so=
und like a small number of incidental nulls/tombstones (&lt; 20% of columns=
, otherwise CASSANDRA-3442 takes over) is going to have any performance imp=
act either in your query patterns or in compaction in any practical sense.<=
/div><div><br></div><div>If INSERT of null values is problematic for small =
portions of your data, then it stands to reason that an INSERT option conta=
ining an instruction to prevent tombstone creation would be an important pe=
rformance optimization (and would also address the fact that non-null colle=
ctions also generate tombstones on INSERT as well). =C2=A0<font face=3D"mon=
ospace, monospace">INSERT INTO ... USING no_tombstones</font>;</div><div><b=
r></div><div style=3D"font-size:12.8000001907349px"><br></div><div>&gt;=C2=
=A0<span style=3D"font-size:12.8000001907349px">There&#39;s thresholds (log=
 messages, etc.) which operate on tombstone counts over a certain number, b=
ut not on column counts over the same number.</span></div><div><br></div>to=
mbstone_warn_threshold and tombstone_failure_threshold only apply to cluste=
ring scans right?=C2=A0 I.E. tombstones don&#39;t count against those thres=
holds if they are not part of the clustering key column being considered fo=
r the non-EQ relation?=C2=A0 The documentation certainly implies so:<div><b=
r><div><span style=3D"font-weight:bold;color:rgb(99,100,102);font-family:Ro=
botoRegular,arial,helvetica,clean,sans-serif;font-size:14px">tombstone_warn=
_threshold</span><a rel=3D"internal" href=3D"http://docs.datastax.com/en/ca=
ssandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=3Dre=
ference_ds_qfg_n1r_1k__tombstone_warn_threshold" style=3D"font-weight:bold;=
font-family:RobotoRegular,arial,helvetica,clean,sans-serif;font-size:14px;t=
ext-decoration:none;color:rgb(195,72,30);margin-left:0.1em" target=3D"_blan=
k">=C2=B6</a><br></div><div><dl style=3D"color:rgb(99,100,102);font-family:=
RobotoRegular,arial,helvetica,clean,sans-serif;font-size:14px"><dd style=3D=
"margin:0px 0px 5px 40px">(Default:=C2=A0<span>1000</span>) The maximum num=
ber of tombstones a query can scan before warning.</dd></dl><dl style=3D"co=
lor:rgb(99,100,102);font-family:RobotoRegular,arial,helvetica,clean,sans-se=
rif;font-size:14px"><dt style=3D"margin-top:12px;margin-bottom:3px;font-wei=
ght:bold">tombstone_failure_threshold<a rel=3D"internal" href=3D"http://doc=
s.datastax.com/en/cassandra/2.0/cassandra/configuration/configCassandra_yam=
l_r.html?scroll=3Dreference_ds_qfg_n1r_1k__tombstone_failure_threshold" sty=
le=3D"text-decoration:none;color:rgb(195,72,30);margin-left:0.1em" target=
=3D"_blank">=C2=B6</a></dt><dd style=3D"margin:0px 0px 5px 40px">(Default:=
=C2=A0<span>100000</span>) The maximum number of tombstones a query can sca=
n before aborting.</dd></dl></div></div><div class=3D"gmail_extra"><br></di=
v><div class=3D"gmail_extra"><div class=3D"gmail_quote">On Wed, Apr 29, 201=
5 at 12:42 PM, Robert Coli <span dir=3D"ltr">&lt;<a href=3D"mailto:rcoli@ev=
entbrite.com" target=3D"_blank">rcoli@eventbrite.com</a>&gt;</span> wrote:<=
br><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bord=
er-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:soli=
d;padding-left:1ex"><div dir=3D"ltr"><div class=3D"gmail_extra"><div class=
=3D"gmail_quote"><span>On Wed, Apr 29, 2015 at 9:16 AM, Eric Stevens <span =
dir=3D"ltr">&lt;<a href=3D"mailto:mightye@gmail.com" target=3D"_blank">migh=
tye@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" st=
yle=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb=
(204,204,204);border-left-style:solid;padding-left:1ex"><div dir=3D"ltr"><d=
iv>In the end, inserting a tombstone into a non-clustered column shouldn=
9;t be appreciably worse (if it is at all) than inserting a value instead.=
=C2=A0 Or am I missing something here?</div></div></blockquote><div><br></d=
iv></span><div>There&#39;s thresholds (log messages, etc.) which operate on=
 tombstone counts over a certain number, but not on column counts over the =
same number.</div><div><br></div><div>Given that tombstones are often small=
er than data columns, sorta hard to understand conceptually?</div><div><br>=
</div><div>=3DRob</div><div><br></div></div></div></div>
</blockquote></div><br></div></div>

--001a1149017c9a1e51051569e648--