Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 820BDFE1A for ; Fri, 12 Dec 2014 18:00:12 +0000 (UTC) Received: (qmail 97942 invoked by uid 500); 12 Dec 2014 18:00:08 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 97908 invoked by uid 500); 12 Dec 2014 18:00:08 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 97889 invoked by uid 99); 12 Dec 2014 18:00:08 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Dec 2014 18:00:08 +0000 X-ASF-Spam-Status: No, hits=4.0 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_REMOTE_IMAGE,URIBL_DBL_SPAM X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jonathan.haddad@gmail.com designates 209.85.217.180 as permitted sender) Received: from [209.85.217.180] (HELO mail-lb0-f180.google.com) (209.85.217.180) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Dec 2014 17:59:42 +0000 Received: by mail-lb0-f180.google.com with SMTP id l4so6218575lbv.39 for ; Fri, 12 Dec 2014 09:58:56 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=mime-version:references:from:date:message-id:subject:to :content-type; bh=oh+Blfs3Su/xtdpvdwMM6JmhHEB9k0ubwOI4nXciABE=; b=EDHV2nFTVt2v7NKv8WzGsXf8GAaPC8j3SEOJFyk8z/dHwXCoYOV1KljvA/q8Q403G0 J5fHblCNW9+gy9pc/GY9Yehd5V8/ubhse1eQ2lvfPRrQaAn3WCMpTfTZVdjHxOEmIMnX bnL5IK9/41rDJ78u/ebSh2FiFAh8cv+pclqDCneTNL/qKEqJGEk75yrnDKtrCqeDI95X IqjzvBLkapxBld/Fx/b0EheuK3g2Cum0dYg6TsKJB1uk5u0BsfeUTsE74CLRblokinu6 10J/36T+guXKIU+N0oWG0LnbhbpGKI/FDrpSSDnlrOX3ZcdGaLE1/BK+T3H9QsVMwuXd f1Mw== X-Received: by 10.112.150.71 with SMTP id ug7mr16648658lbb.73.1418407136161; Fri, 12 Dec 2014 09:58:56 -0800 (PST) MIME-Version: 1.0 References: <045D8FD556C73347A47F956EE65F8220185546E7@S11MAILD013N2.sh11.lan> From: Jonathan Haddad Date: Fri, 12 Dec 2014 17:58:55 +0000 Message-ID: Subject: Re: batch_size_warn_threshold_in_kb To: user@cassandra.apache.org, Ryan Svihla Content-Type: multipart/alternative; boundary=047d7b3436387f7594050a08a5fb X-Virus-Checked: Checked by ClamAV on apache.org --047d7b3436387f7594050a08a5fb Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable The really important thing to really take away from Ryan's original post is that batches are not there for performance. The only case I consider batches to be useful for is when you absolutely need to know that several tables all get a mutation (via logged batches). The use case for this is when you've got multiple tables that are serving as different views for data. It is absolutely not going to help you if you're trying to lump queries together to reduce network & server overhead - in fact it'll do the opposite. If you're trying to do that, instead perform many async queries. The overhead of batches in cassandra is significant and you're going to hit a lot of problems if you use them excessively (timeouts / failures). tl;dr: you probably don't want batch, you most likely want many async calls On Thu Dec 11 2014 at 11:15:00 PM Mohammed Guller wrote: > Ryan, > > Thanks for the quick response. > > > > I did see that jira before posting my question on this list. However, I > didn=E2=80=99t see any information about why 5kb+ data will cause instabi= lity. 5kb > or even 50kb seems too small. For example, if each mutation is 1000+ byte= s, > then with just 5 mutations, you will hit that threshold. > > > > In addition, Patrick is saying that he does not recommend more than 100 > mutations per batch. So why not warn users just on the # of mutations in = a > batch? > > > > Mohammed > > > > *From:* Ryan Svihla [mailto:rsvihla@datastax.com] > *Sent:* Thursday, December 11, 2014 12:56 PM > *To:* user@cassandra.apache.org > *Subject:* Re: batch_size_warn_threshold_in_kb > > > > Nothing magic, just put in there based on experience. You can find the > story behind the original recommendation here > > > > https://issues.apache.org/jira/browse/CASSANDRA-6487 > > > > Key reasoning for the desire comes from Patrick McFadden: > > > "Yes that was in bytes. Just in my own experience, I don't recommend more > than ~100 mutations per batch. Doing some quick math I came up with 5k as > 100 x 50 byte mutations. > > Totally up for debate." > > > > It's totally changeable, however, it's there in no small part because so > many people confuse the BATCH keyword as a performance optimization, this > helps flag those cases of misuse. > > > > On Thu, Dec 11, 2014 at 2:43 PM, Mohammed Guller > wrote: > > Hi =E2=80=93 > > The cassandra.yaml file has property called *batch_size_warn_threshold_in= _kb. > * > > The default size is 5kb and according to the comments in the yaml file, i= t > is used to log WARN on any batch size exceeding this value in kilobytes. = It > says caution should be taken on increasing the size of this threshold as = it > can lead to node instability. > > > > Does anybody know the significance of this magic number 5kb? Why would a > higher number (say 10kb) lead to node instability? > > > > Mohammed > > > > > -- > > [image: datastax_logo.png] > > Ryan Svihla > > Solution Architect > > > [image: twitter.png] [image: linkedin.png] > > > > > DataStax is the fastest, most scalable distributed database technology, > delivering Apache Cassandra to the world=E2=80=99s most innovative enterp= rises. > Datastax is built to be agile, always-on, and predictably scalable to any > size. With more than 500 customers in 45 countries, DataStax is the > database technology and transactional backbone of choice for the worlds > most innovative companies such as Netflix, Adobe, Intuit, and eBay. > > > --047d7b3436387f7594050a08a5fb Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable The really important thing to really take away from Ryan's original pos= t is that batches are not there for performance.=C2=A0 The only case I cons= ider batches to be useful for is when you absolutely need to know that seve= ral tables all get a mutation (via logged batches).=C2=A0 The use case for = this is when you've got multiple tables that are serving as different v= iews for data.=C2=A0 It is absolutely not going to help you if you're t= rying to lump queries together to reduce network & server overhead - in= fact it'll do the opposite.=C2=A0 If you're trying to do that, ins= tead perform many async queries.=C2=A0 The overhead of batches in cassandra= is significant and you're going to hit a lot of problems if you use th= em excessively (timeouts / failures).

tl;dr: you probabl= y don't want batch, you most likely want many async calls

On Thu Dec 11 2014 at 11:15:00 PM Mohammed Guller <= mohammed@glassbeam.com> wr= ote:

Ryan,

Thanks for the quick resp= onse.

=C2=A0

I did see that jira befor= e posting my question on this list. However, I didn=E2=80=99t see any infor= mation about why 5kb+ data will cause instability. 5kb or even 50kb seems too small. For example, if each mutation is 1000+ bytes, then with j= ust 5 mutations, you will hit that threshold.

=C2=A0

In addition, Patrick is s= aying that he does not recommend more than 100 mutations per batch. So why = not warn users just on the # of mutations in a batch?<= /p>

=C2=A0

Mohammed

=C2=A0

From: Ryan Svi= hla [mailto:rsvih= la@datastax.com]
Sent: Thursday, December 11, 2014 12:56 PM
To: u= ser@cassandra.apache.org
Subject: Re: batch_size_warn_threshold_in_kb

=C2=A0

Nothing magic, just put in there based on experience= . You can find the story behind the original recommendation here<= /u>

=C2=A0

=C2=A0

Key reasoning for the desire comes from Patrick McFa= dden:


"Yes that was in bytes. Just in my own experience, I don't recomme= nd more than ~100 mutations per batch. Doing some quick math I came up with= 5k as 100 x 50 byte mutations.

Totally up for debate."

=C2=A0

It's totally changeable, however, it's there= in no small part because so many people confuse the BATCH keyword as a per= formance optimization, this helps flag those cases of misuse.=

=C2=A0

On Thu, Dec 11, 2014 at 2:43 PM, Mohammed Guller <= ;mohammed@glass= beam.com> wrote:

Hi =E2=80=93

The cassandra.yaml file has property called batch_size_warn_threshold_in_kb.

The default size is 5kb and according to the comments in the yaml = file, it is used to log WARN on any batch size exceeding this value in kilo= bytes. It says caution should be taken on increasing the size of this threshold as it can= lead to node instability.

=C2=A0

Does anybody know the significance of this magic num= ber 5kb? Why would a higher number (say 10kb) lead to node instability?<= /u>

=C2=A0

Mohammed


=C2=A0

--

3D"d=

Ryan= Svihla

Solu= tion Architect


3D"twitter.png"3D"linkedi=

=C2=A0

DataSta= x is the fastest, most scalable distributed database technology, delivering= Apache Cassandra to the world=E2=80=99s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any = size. With more than 500 customers in 45 countries, DataStax is the database technology and tran= sactional backbone of choice for the worlds most innovative companies such = as Netflix, Adobe, Intuit, and eBay.

=C2=A0

--047d7b3436387f7594050a08a5fb--