Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 01B7C1019E for ; Wed, 21 Aug 2013 20:17:16 +0000 (UTC) Received: (qmail 81822 invoked by uid 500); 21 Aug 2013 20:17:13 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 81715 invoked by uid 500); 21 Aug 2013 20:17:11 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 81705 invoked by uid 99); 21 Aug 2013 20:17:10 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Aug 2013 20:17:10 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW X-Spam-Check-By: apache.org Received-SPF: error (athena.apache.org: local policy) Received: from [209.85.215.178] (HELO mail-ea0-f178.google.com) (209.85.215.178) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Aug 2013 20:17:04 +0000 Received: by mail-ea0-f178.google.com with SMTP id a15so506642eae.9 for ; Wed, 21 Aug 2013 13:16:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=HnU7L96hnyXLoXETVntrf7eiKmuQzGbn4avTGSLA8E4=; b=eSqVG5eGJjjZerhLeArM+yp8FT4IMEv9xZLnv15dckBJzyuX8np35XNLX49PPze8b8 ZkRXRtKJ4+odIfLzuBcrPhMQmCr+oax2Ra7sM8Gpd0zIN1EMA9aJ09bT289FUdM//QKL hwOHKVxziIpHZ1M/iB4FV4bt4K7r/ujhedE1G5OxjuzxEq88MyHo8+uyU95tC/ivCi1C vv0IPzaz+ZBLjKdUdqILBB0tDo6DFPBKWCJH7/s4Kl/gJnRSJW+ELNmoyvmUvc5EbrYM OpmFMgUrKqCcZkWScvSpjSO0cKplRdn7+egT9pdnjL/m87q4u7M+j+97bw8c9804pZVR TbgA== X-Gm-Message-State: ALoCoQnIwTwyVa9IsYUibFc1BEnXZa80hgfIY7wIQ8VnNW4+v2yUbLd53nuHbGtmkSJGR5dg+rYB MIME-Version: 1.0 X-Received: by 10.14.177.8 with SMTP id c8mr5829557eem.56.1377116182371; Wed, 21 Aug 2013 13:16:22 -0700 (PDT) Received: by 10.223.63.18 with HTTP; Wed, 21 Aug 2013 13:16:22 -0700 (PDT) X-Originating-IP: [70.112.126.233] In-Reply-To: <521509B9.5040702@gmail.com> References: <5212A108.9050804@gmail.com> <5212D106.4050700@gmail.com> <5213757D.8030106@gmail.com> <5213E718.7030203@gmail.com> <521411CE.4070302@gmail.com> <521509B9.5040702@gmail.com> Date: Wed, 21 Aug 2013 15:16:22 -0500 Message-ID: Subject: Re: insert performance (1.2.8) From: Nate McCall To: Cassandra Users Content-Type: multipart/alternative; boundary=047d7b621ef2dd95b904e47ad8cb X-Virus-Checked: Checked by ClamAV on apache.org --047d7b621ef2dd95b904e47ad8cb Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable The only thing I can think to suggest at this point is upping that batch size - say to 500 and see what happens. Do you have any monitoring on this cluster? If not, what do you see as the output of 'nodetool tpstats' while you run this test? On Wed, Aug 21, 2013 at 1:40 PM, Keith Freeman <8forty@gmail.com> wrote: > Building the giant batch string wasn't as bad as I thought, and at first > I had great(!) results (using "unlogged" batches): 2500 rows/sec (batches > of 100 in 48 threads) ran very smoothly, and the load on the cassandra > server nodes averaged about 1.0 or less continuously. > > But then I upped it to 5000 rows/sec, and the load on the server nodes > jumped to a continuous load on all 3 of 8-10 with peaks over 14. I also > tried running 2 separate clients at 2500 rows/sec with the same results. = I > don't see any compactions while at this load, so would this likely be the > result of GC thrashing? > > Seems like I'm spending a lot of effort and am still not getting very > close to being able to insert 10k rows (10M of data each) per second, whi= ch > is pretty disappointing. > > > On 08/20/2013 07:16 PM, Nate McCall wrote: > > Thrift will allow for more large, free-form batch contstruction. The > increase will be doing a lot more in the same payload message. Otherwise > CQL is more efficient. > > If you do build those giant string, yes you should see a performance > improvement. > > > On Tue, Aug 20, 2013 at 8:03 PM, Keith Freeman <8forty@gmail.com> wrote: > >> Thanks. Can you tell me why would using thrift would improve >> performance? >> >> Also, if I do try to build those giant strings for a prepared batch >> statement, should I expect another performance improvement? >> >> >> >> On 08/20/2013 05:06 PM, Nate McCall wrote: >> >> Ugh - sorry, I knew Sylvain and Micha=EBl had worked on this recently bu= t >> it is only in 2.0 - I could have sworn it got marked for inclusion back >> into 1.2 but I was wrong: >> https://issues.apache.org/jira/browse/CASSANDRA-4693 >> >> This is indeed an issue if you don't know the column count before hand >> (or had a very large number of them like in your case). Again, apologies= , I >> would not have recommended that route if I knew it was only in 2.0. >> >> I would be willing to bet you could hit those insert numbers pretty >> easily with thrift given the shape of your mutation. >> >> >> On Tue, Aug 20, 2013 at 5:00 PM, Keith Freeman <8forty@gmail.com> wrote: >> >>> So I tried inserting prepared statements separately (no batch), and my >>> server nodes load definitely dropped significantly. Throughput from my >>> client improved a bit, but only a few %. I was able to *almost* get 50= 00 >>> rows/sec (sort of) by also reducing the rows/insert-thread to 20-50 and >>> eliminating all overhead from the timing, i.e. timing only the tight fo= r >>> loop of inserts. But that's still a lot slower than I expected. >>> >>> I couldn't do batches because the driver doesn't allow prepared >>> statements in a batch (QueryBuilder API). It appears the batch itself >>> could possibly be a prepared statement, but since I have 40+ columns on >>> each insert that would take some ugly code to build so I haven't tried = it >>> yet. >>> >>> I'm using CL "ONE" on the inserts and RF 2 in my schema. >>> >>> >>> On 08/20/2013 08:04 AM, Nate McCall wrote: >>> >>> John makes a good point re:prepared statements (I'd increase batch size= s >>> again once you did this as well - separate, incremental runs of course = so >>> you can gauge the effect of each). That should take out some of the >>> processing overhead of statement validation in the server (some - that = load >>> spike still seems high though). >>> >>> I'd actually be really interested as to what your results were after >>> doing so - i've not tried any A/B testing here for prepared statements = on >>> inserts. >>> >>> Given your load is on the server, i'm not sure adding more async >>> indirection on the client would buy you too much though. >>> >>> Also, at what RF and consistency level are you writing? >>> >>> >>> On Tue, Aug 20, 2013 at 8:56 AM, Keith Freeman <8forty@gmail.com> wrote= : >>> >>>> Ok, I'll try prepared statements. But while sending my statements >>>> async might speed up my client, it wouldn't improve throughput on the >>>> cassandra nodes would it? They're running at pretty high loads and on= ly >>>> about 10% idle, so my concern is that they can't handle the data any >>>> faster, so something's wrong on the server side. I don't really think >>>> there's anything on the client side that matters for this problem. >>>> >>>> Of course I know there are obvious h/w things I can do to improve >>>> server performance: SSDs, more RAM, more cores, etc. But I thought th= e >>>> servers I have would be able to handle more rows/sec than say Mysql, s= ince >>>> write speed is supposed to be one of Cassandra's strengths. >>>> >>>> >>>> On 08/19/2013 09:03 PM, John Sanda wrote: >>>> >>>> I'd suggest using prepared statements that you initialize at >>>> application start up and switching to use Session.executeAsync coupled= with >>>> Google Guava Futures API to get better throughput on the client side. >>>> >>>> >>>> On Mon, Aug 19, 2013 at 10:14 PM, Keith Freeman <8forty@gmail.com>wrot= e: >>>> >>>>> Sure, I've tried different numbers for batches and threads, but >>>>> generally I'm running 10-30 threads at a time on the client, each sen= ding a >>>>> batch of 100 insert statements in every call, using the >>>>> QueryBuilder.batch() API from the latest datastax java driver, then c= alling >>>>> the Session.execute() function (synchronous) on the Batch. >>>>> >>>>> I can't post my code, but my client does this on each iteration: >>>>> -- divides up the set of inserts by the number of threads >>>>> -- stores the current time >>>>> -- tells all the threads to send their inserts >>>>> -- then when they've all returned checks the elapsed time >>>>> >>>>> At about 2000 rows for each iteration, 20 threads with 100 inserts >>>>> each finish in about 1 second. For 4000 rows, 40 threads with 100 in= serts >>>>> each finish in about 1.5 - 2 seconds, and as I said all 3 cassandra n= odes >>>>> have a heavy CPU load while the client is hardly loaded. I've tried = with >>>>> 10 threads and more inserts per batch, or up to 60 threads with fewer= , >>>>> doesn't seem to make a lot of difference. >>>>> >>>>> >>>>> On 08/19/2013 05:00 PM, Nate McCall wrote: >>>>> >>>>> How big are the batch sizes? In other words, how many rows are you >>>>> sending per insert operation? >>>>> >>>>> Other than the above, not much else to suggest without seeing some >>>>> example code (on pastebin, gist or similar, ideally). >>>>> >>>>> On Mon, Aug 19, 2013 at 5:49 PM, Keith Freeman <8forty@gmail.com>wrot= e: >>>>> >>>>>> I've got a 3-node cassandra cluster (16G/4-core VMs ESXi v5 on 2.5Gh= z >>>>>> machines not shared with any other VMs). I'm inserting time-series = data >>>>>> into a single column-family using "wide rows" (timeuuids) and have a= 3-part >>>>>> partition key so my primary key is something like ((a, b, day), >>>>>> in-time-uuid), x, y, z). >>>>>> >>>>>> My java client is feeding rows (about 1k of raw data size each) in >>>>>> batches using multiple threads, and the fastest I can get it run rel= iably >>>>>> is about 2000 rows/second. Even at that speed, all 3 cassandra node= s are >>>>>> very CPU bound, with loads of 6-9 each (and the client machine is ha= rdly >>>>>> breaking a sweat). I've tried turning off compression in my table w= hich >>>>>> reduced the loads slightly but not much. There are no other updates= or >>>>>> reads occurring, except the datastax opscenter. >>>>>> >>>>>> I was expecting to be able to insert at least 10k rows/second with >>>>>> this configuration, and after a lot of reading of docs, blogs, and g= oogle, >>>>>> can't really figure out what's slowing my client down. When I incre= ase the >>>>>> insert speed of my client beyond 2000/second, the server responses a= re just >>>>>> too slow and the client falls behind. I had a single-node Mysql dat= abase >>>>>> that can handle 10k of these data rows/second, so I really feel like= I'm >>>>>> missing something in Cassandra. Any ideas? >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> >>>> - John >>>> >>>> >>>> >>> >>> >> >> > > --047d7b621ef2dd95b904e47ad8cb Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
The only thing I can think to suggest at this point is upp= ing that batch size - say to 500 and see what happens.=A0

Do you have any monitoring on this cluster? If not, what do you see as th= e output of 'nodetool tpstats' while you run this test?


On Wed,= Aug 21, 2013 at 1:40 PM, Keith Freeman <8forty@gmail.com> wr= ote:
=20 =20 =20
Building the giant batch string wasn't as bad as I thought, and at first I had great(!) results (using "unlogged" batches): 2500 rows/sec (batches of 100 in 48 threads) ran very smoothly, and the load on the cassandra server nodes averaged about 1.0 or less continuously.

But then I upped it to 5000 rows/sec, and the load on the server nodes jumped to a continuous load on all 3 of 8-10 with peaks over 14.=A0 I also tried running 2 separate clients at 2500 rows/sec with the same results.=A0 I don't see any compactions while at this load= , so would this likely be the result of GC thrashing?

Seems like I'm spending a lot of effort and am still not getting very close to being able to insert 10k rows (10M of data each) per second, which is pretty disappointing.


On 08/20/2013 07:16 PM, Nate McCall wrote:
Thrift will allow for more large, free-form batch contstruction. The increase will be doing a lot more in the same payload message. Otherwise CQL is more efficient.=A0

If you do build those giant string, yes you should see a performance improvement.=A0


On Tue, Aug 20, 2013 at 8:03 PM, Keith Freeman <8forty@gmail.com> wrote:
Thanks.=A0 Can you t= ell me why would using thrift would improve performance?

Also, if I do try to build those giant strings for a prepared batch statement, should I expect another performance improvement?



On 08/20/2013 05:06 PM, Nate McCall wrote:
Ugh - sorry, I knew Sylvain and=A0Micha=EBl had worked on this recently but it is only in 2.0 - I could have sworn it got marked for=A0inclusion back into 1.2 but I was wrong:

This is indeed an issue if you don't know th= e column count before hand (or had a very large number of them like in your case). Again, apologies, I would not have recommended that route if I knew it was only in 2.0.=A0

I would be willing to bet you could hit those insert numbers pretty easily with thrift given the shape of your mutation.=A0


On Tue, Aug 20, 2013 at 5:00 PM, Keith Freeman <8forty@gmail.com> wrote:
So I tried inserting prepared statements separately (no batch), and my server nodes load definitely dropped significantly.=A0 Throughput from my client improved a bit, but only a few %.=A0 I was able to *almost* get 5000 rows/sec (sort of) by also reducing the rows/insert-thread to 20-50 and eliminating all overhead from the timing, i.e. timing only the tight for loop of inserts.=A0 But that's still a lot slower t= han I expected.

I couldn't do batches because the driver doesn't allow prepared statements in a batc= h (QueryBuilder API).=A0 It appears the batch itself could possibly be a prepared statement, but since I have 40+ columns on each insert that would take some ugly code to build so I haven't tried it yet.

I'm using CL "ONE" on the inserts= and RF 2 in my schema.


On 08/20/2013 08:04 AM, Nate McCall wrote:
John makes a good point re:prepared statements (I'd increas= e batch sizes again once you did this as well - separate, incremental runs of course so you can gauge the effect of each). That should take out some of the processing overhead of statement validation in the server (some - that load spike still seems high though).=A0

I'd actually be really interested as to what your results were after doing so - i've not tried any A/B testing here for prepared statements on inserts.=A0

Given your load is on the server, i'm not sure adding more async indirection on the client would buy you too much though.=A0

Also, at what RF and consistency level are you writing?


On Tue, Aug 20, 2013 at 8:56 AM, Keith Freeman <8forty@gmail.com> wrote:
Ok, I'll try prepared statements.=A0=A0 Bu= t while sending my statements async might speed up my client, it wouldn't improve throughput on the cassandra nodes would it?=A0 They're running at pretty high loads and only about 10% idle, so my concern is that they can't handle the data any faster, so something's wrong on the server side.=A0 I don't reall= y think there's anything on the client side that matters for this problem.

Of course I know there are obvious h/w things I can do to improve server performance: SSDs, more RAM, more cores, etc.=A0 But I thought the servers I have would be able to handle more rows/sec than say Mysql, since write speed is supposed to be one of Cassandra's strengths.


On 08/19/2013 09:03 PM, John Sanda wrote:
I'd suggest using prepared statements that you initialize at application start up and switching to use Session.executeAsync coupled with Google Guava Futures API to get better throughput on the client side.
=

On Mon, Aug 19, 2013 at 10:14 PM, Keith Freeman <8forty@gmail.c= om> wrote:
Sure, I've trie= d different numbers for batches and threads, but generally I'm running 10-30 threads at a time on the client, each sending a batch of 100 insert statements in every call, using the QueryBuilder.batch(= ) API from the latest datastax java driver, then calling the Session.execute() function (synchronous) on the Batch.

I can't post my code, but my client does this on each iteration:
-- divides up the set of inserts by the number of threads
-- stores the current time
-- tells all the threads to send their inserts
-- then when they've all returned checks the elapsed time
At about 2000 rows for each iteration, 20 threads with 100 inserts each finish in about 1 second.=A0 For 4000 rows, 40 threads with 100 inserts each finish in about 1.5 - 2 seconds, and as I said all 3 cassandra nodes have a heavy CPU load while the client is hardly loaded.=A0 I've tried with 10 threads and more inserts per batch, or up to 60 threads with fewer, doesn't seem to make a lot of difference.


On 08/19/2013 05:00 PM, Nate McCall wrote:
How big are the batch sizes? In other words, how many rows are you sending per insert operation?

Other than the above, not much else to suggest without seeing some example code (on pastebin, gist or similar, ideally).=A0

On Mon, Aug 19, 2013 at 5:49 PM, Keith Freeman <8forty@g= mail.com> wrote:
I've got a 3-node cassandra cluster (16G/4-core VMs ESXi v5 on 2.5Ghz machines not shared with any other VMs). =A0I'm inserting time-series data into a single column-family using "wide rows" (timeuuids) and have a 3-part partition key so my primary key is something like ((a, b, day), in-time-uuid), x, y, z).

My java client is feeding rows (about 1k of raw data size each) in batches using multiple threads, and the fastest I can get it run reliably is about 2000 rows/second. =A0Even at that speed, all 3 cassandra nodes are very CPU bound, with loads of 6-9 each (and the client machine is hardly breaking a sweat). =A0I'= ve tried turning off compression in my table which reduced the loads slightly but not much. =A0There are no other updates or reads occurring, except the datastax opscenter.

I was expecting to be able to insert at least 10k rows/second with this configuration, and after a lot of reading of docs, blogs, and google, can't really figure out what's slowing my client down. =A0When I increase the insert speed of my client beyond 2000/second, the server responses are just too slow and the client falls behind. =A0I had a single-node Mysql database that can handle 10k of these data rows/second, so I really feel like I'm missing something in Cassandra. =A0Any ideas?






--

- John








--047d7b621ef2dd95b904e47ad8cb--