Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CD97C17231 for ; Wed, 6 May 2015 13:39:03 +0000 (UTC) Received: (qmail 8173 invoked by uid 500); 6 May 2015 13:38:58 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 8094 invoked by uid 500); 6 May 2015 13:38:58 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 8033 invoked by uid 99); 6 May 2015 13:38:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 May 2015 13:38:58 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: message received from 54.164.171.186 which is an MX secondary for user@cassandra.apache.org) Received: from [54.164.171.186] (HELO mx1-us-east.apache.org) (54.164.171.186) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 May 2015 13:38:53 +0000 Received: from mail-qk0-f181.google.com (mail-qk0-f181.google.com [209.85.220.181]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 69BCB43E79 for ; Wed, 6 May 2015 13:38:33 +0000 (UTC) Received: by qku63 with SMTP id 63so5754661qku.3 for ; Wed, 06 May 2015 06:37:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=w+yLdVfZe+nlS01pOn4TnQr5Y+hX54OmsUr/GZNtZs8=; b=LzVWRFKdOFQoyNauT9xmosY/QpM9sFWAVxxPWQks+3vq+HNktNryCejumi8tMEH5Bg wTqHDSR/mflBYxX1ptTWKUQhX+e088NTn9YEYQEgFnHN8yl9igG22QgXb+Vj7HN0sUsK 0SEzZgBWOZ0uZ0QRu/3HPgI09b6XahiAeztwyIoFuinRKrnmxlB7dUzZqgZyO4pZZh9c +uxJjBeyNXocUp2heIksdVlfW9VFcsPHrbkcaGHUPPqlIvGA0kAQfmxT5PUCgsvlgAmi YpDAR1JhZ2W8nx+r1FImy7kbp2AAmicLBCueYwzzjw5SIkWjckB0dAInZaqXRf25/DWC gyHg== X-Received: by 10.55.52.129 with SMTP id b123mr68241656qka.34.1430919468152; Wed, 06 May 2015 06:37:48 -0700 (PDT) MIME-Version: 1.0 Received: by 10.96.172.39 with HTTP; Wed, 6 May 2015 06:37:27 -0700 (PDT) In-Reply-To: References: <08a04f153c668712b26719ba69b92e2a@mail.gmail.com> <720FB8FD-93B6-44B0-8994-1B6F23DFCC58@fold3.com> <43ed4bfb16d497b8f81d317b79b94355@mail.gmail.com> From: Eric Stevens Date: Wed, 6 May 2015 06:37:27 -0700 Message-ID: Subject: Re: Inserting null values To: "user@cassandra.apache.org" Content-Type: multipart/alternative; boundary=001a1149017c9a1e51051569e648 X-Virus-Checked: Checked by ClamAV on apache.org --001a1149017c9a1e51051569e648 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I agree that inserting null is not as good as not inserting that column at all when you have confidence that you are not shadowing any underlying data. But pragmatically speaking it really doesn't sound like a small number of incidental nulls/tombstones (< 20% of columns, otherwise CASSANDRA-3442 takes over) is going to have any performance impact either in your query patterns or in compaction in any practical sense. If INSERT of null values is problematic for small portions of your data, then it stands to reason that an INSERT option containing an instruction to prevent tombstone creation would be an important performance optimization (and would also address the fact that non-null collections also generate tombstones on INSERT as well). INSERT INTO ... USING no_tombstones; > There's thresholds (log messages, etc.) which operate on tombstone counts over a certain number, but not on column counts over the same number. tombstone_warn_threshold and tombstone_failure_threshold only apply to clustering scans right? I.E. tombstones don't count against those thresholds if they are not part of the clustering key column being considered for the non-EQ relation? The documentation certainly implies so= : tombstone_warn_threshold=C2=B6 (Default: 1000) The maximum number of tombstones a query can scan before warning.tombstone_failure_threshold=C2=B6 (Default: 100000) The maximum number of tombstones a query can scan before aborting. On Wed, Apr 29, 2015 at 12:42 PM, Robert Coli wrote: > On Wed, Apr 29, 2015 at 9:16 AM, Eric Stevens wrote: > >> In the end, inserting a tombstone into a non-clustered column shouldn't >> be appreciably worse (if it is at all) than inserting a value instead. = Or >> am I missing something here? >> > > There's thresholds (log messages, etc.) which operate on tombstone counts > over a certain number, but not on column counts over the same number. > > Given that tombstones are often smaller than data columns, sorta hard to > understand conceptually? > > =3DRob > > --001a1149017c9a1e51051569e648 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I agree that inserting null is not as good as not ins= erting that column at all when you have confidence that you are not shadowi= ng any underlying data. But pragmatically speaking it really doesn't so= und like a small number of incidental nulls/tombstones (< 20% of columns= , otherwise CASSANDRA-3442 takes over) is going to have any performance imp= act either in your query patterns or in compaction in any practical sense.<= /div>

If INSERT of null values is problematic for small = portions of your data, then it stands to reason that an INSERT option conta= ining an instruction to prevent tombstone creation would be an important pe= rformance optimization (and would also address the fact that non-null colle= ctions also generate tombstones on INSERT as well). =C2=A0INSERT INTO ... USING no_tombstones;

>=C2= =A0There's thresholds (log= messages, etc.) which operate on tombstone counts over a certain number, b= ut not on column counts over the same number.

to= mbstone_warn_threshold and tombstone_failure_threshold only apply to cluste= ring scans right?=C2=A0 I.E. tombstones don't count against those thres= holds if they are not part of the clustering key column being considered fo= r the non-EQ relation?=C2=A0 The documentation certainly implies so:
tombstone_warn= _threshold=C2=B6
(Default:=C2=A01000) The maximum num= ber of tombstones a query can scan before warning.
tombstone_failure_threshold=C2=B6
(Default:= =C2=A0100000) The maximum number of tombstones a query can sca= n before aborting.

On Wed, Apr 29, 201= 5 at 12:42 PM, Robert Coli <rcoli@eventbrite.com> wrote:<= br>
On Wed, Apr 29, 2015 at 9:16 AM, Eric Stevens <migh= tye@gmail.com> wrote:
In the end, inserting a tombstone into a non-clustered column shouldn= 9;t be appreciably worse (if it is at all) than inserting a value instead.= =C2=A0 Or am I missing something here?

There's thresholds (log messages, etc.) which operate on= tombstone counts over a certain number, but not on column counts over the = same number.

Given that tombstones are often small= er than data columns, sorta hard to understand conceptually?

=
=3DRob


--001a1149017c9a1e51051569e648--