From user-return-60448-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org Sun Mar 18 11:17:10 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 4026C180645 for ; Sun, 18 Mar 2018 11:17:09 +0100 (CET) Received: (qmail 98089 invoked by uid 500); 18 Mar 2018 10:17:07 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 98074 invoked by uid 99); 18 Mar 2018 10:17:07 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 18 Mar 2018 10:17:07 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id DDA83C048B for ; Sun, 18 Mar 2018 10:17:06 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.899 X-Spam-Level: * X-Spam-Status: No, score=1.899 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, T_REMOTE_IMAGE=0.01, T_RP_MATCHES_RCVD=-0.01] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); domainkeys=pass (768-bit key) header.from=onmstester@zoho.com header.d=zoho.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id P2FURXJgqpId for ; Sun, 18 Mar 2018 10:17:04 +0000 (UTC) Received: from sender-pp-092.zoho.com (sender-pp-092.zoho.com [135.84.80.237]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 975315F2F2 for ; Sun, 18 Mar 2018 10:17:03 +0000 (UTC) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=zapps768; d=zoho.com; h=date:from:to:message-id:in-reply-to:references:subject:mime-version:content-type:user-agent; b=mgDLRFkzqpJikUkavezLpAdNsk7vni5ba212QNOGeN1B4prRsxLlTsOI+ay6yjZ5fgARDn/tx3l7 1CYkhiLOn+cx2JfHqUjWhx6ATvMe4r/Pu+t7hC1n+kvkgxJUG9Yp Received: from mail.zoho.com by mx.zohomail.com with SMTP id 1521368220310470.03970170690457; Sun, 18 Mar 2018 03:17:00 -0700 (PDT) Received: from [81.91.136.82] by mail.zoho.com with HTTP;Sun, 18 Mar 2018 03:17:00 -0700 (PDT) Date: Sun, 18 Mar 2018 13:47:00 +0330 From: onmstester onmstester To: "user" Message-Id: <162389cc291.10a3b2a8f42054.9083610208493499676@zoho.com> In-Reply-To: References: <16238350c5d.11347609041772.8824338040081188896@zoho.com> <162386ed2fb.12555102e41938.995093679487811109@zoho.com> Subject: Re: Cassandra client tuning MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_106478_957788003.1521368220306" X-Priority: Medium User-Agent: Zoho Mail X-Mailer: Zoho Mail ------=_Part_106478_957788003.1521368220306 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I'm using a queue of 100 ExecuteAsyncs * 1000 statements in in each batch = =3D 100K insert queue in non-batch scenario. Using more than 1000 statememnts per batch throws batch limit exception and= some documents recommend no to change batch_size_limit??! Sent using Zoho Mail ---- On Sun, 18 Mar 2018 13:14:54 +0330 Ben Slater <ben.slater@instaclus= tr.com> wrote ---- When you say batch was worth than async in terms of throughput are you comp= aring throughput with the same number of threads or something? I would have= thought if you have much less CPU usage on the client with batching and yo= ur Cassandra cluster doesn=E2=80=99t sound terribly stressed then there is = room to increase threads on the client to up throughput (unless your bottle= necked on IO or something)?=20 On Sun, 18 Mar 2018 at 20:27 onmstester onmstester <onmstester@zoho.com&= gt; wrote: --=20 Ben Slater Chief Product Officer =20 Read our latest technical blog posts here. This email has been sent on behalf of Instaclustr Pty. Limited (Australia) = and Instaclustr Inc (USA). This email and any attachments may contain confidential and legally privile= ged information. If you are not the intended recipient, do not copy or dis= close its content, but please reply to this email immediately and highlight= the error to the sender and then immediately delete the message. Input data does not preserve good locality and I've already tested batch in= sert, it was worse than executeAsync in case of throughput but much less CP= U usage at client side. Sent using Zoho Mail ---- On Sun, 18 Mar 2018 12:46:02 +0330 Ben Slater <ben.slater@instaclus= tr.com> wrote ---- You will probably find grouping writes into small batches improves overall = performance (if you are not doing it already). See the following presentati= on for some more info: https://www.slideshare.net/Instaclustr/microbatching= -highperformance-writes Cheers Ben On Sun, 18 Mar 2018 at 19:23 onmstester onmstester <onmstester@zoho.com&= gt; wrote: --=20 Ben Slater Chief Product Officer =20 Read our latest technical blog posts here. This email has been sent on behalf of Instaclustr Pty. Limited (Australia) = and Instaclustr Inc (USA). This email and any attachments may contain confidential and legally privile= ged information. If you are not the intended recipient, do not copy or dis= close its content, but please reply to this email immediately and highlight= the error to the sender and then immediately delete the message. I need to insert some millions records in seconds in Cassandra. Using one c= lient with asyncExecute with folllowing configs: maxConnectionsPerHost =3D 5 maxRequestsPerHost =3D 32K maxAsyncQueue at client side =3D 100K I could achieve 25% of throughtput i needed, client CPU is more than 80% a= nd increasing number of threads cause some execAsync to fail, so configs ab= ove are the best the client could handle. Cassandra nodes cpu is less than = 30% in average. The data has no locality in sake of partition keys and i ca= n't use createSStable mechanism. Is there any tuning which i'm missing in c= lient side, cause the server side is already tuned with datastax recomendat= ions. Sent using Zoho Mail ------=_Part_106478_957788003.1521368220306 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable =
I'm using a queue of 100 ExecuteAsyncs * 1000 statements= in in each batch =3D 100K insert queue in non-batch scenario.
Using more than 1000 statememnts per batch throws batch limit exception a= nd some documents recommend no to change batch_size_limit??!

=

Sent using Zoho Mail


---- On Sun, 18 Mar 2018 13:14:54 +0330 Ben Slater <= ;ben.slater@instaclustr.com> wrote ----

When you say batch was worth than as= ync in terms of throughput are you comparing throughput with the same numbe= r of threads or something? I would have thought if you have much less CPU u= sage on the client with batching and your Cassandra cluster doesn=E2=80=99t= sound terribly stressed then there is room to increase threads on the clie= nt to up throughput (unless your bottlenecked on IO or something)? 

On Sun, 18 Mar 2018 at 20:27 on= mstester onmstester <onmstester@zoho.com> wrote:

--

Ben Slater
Chief Product Officer

    
<= /p>

Read our latest technical blog pos= ts here.

= This email = has been sent on behalf of Instaclustr Pty. Limited (Australia) and&nb= sp;Instaclustr Inc (USA).

This email and any attachments may contain con= fidential and legally privileged information.  If you are not the= intended recipient, do not copy or disclose its content, but ple= ase reply to this email immediately and highlight the error to the&nbs= p;sender and then immediately delete the message.=

<= br>
Input data does not preserve good locality and I've already tes= ted batch insert, it was worse than executeAsync in case of throughput but = much less CPU usage at client side.

Sent using Zoho Mail



=
---- On Sun, 18 Mar 2018 12:46:02 +0330 Ben Slater <ben.slater@instacl= ustr.com> wrote ----


<= div>
You will probably f= ind grouping writes into small batches improves overall performance (if you= are not doing it already). See the following presentation for some more in= fo: https://www.slideshare.net/Instaclu= str/microbatching-highperformance-writes

C= heers
Ben

On Sun, 18 Mar 2018 at 19:23 onmstester onmstester <onmstester@zoho.com> wrote:

--

Ben Slater
Chief Product Offic= er

    

<= span class=3D"colour" style=3D"color:rgb(51, 51, 51)">Read our lat= est technical blog posts here.

This email has been sent on behalf of Instaclustr Pty. Lim= ited (Australia) and Instaclustr Inc (USA).

This email and any attachment= s may contain confidential and legally privileged information.&nb= sp; If you are not the intended recipient, do not copy or disclose its=  content, but please reply to this email immediately and highligh= t the error to the sender and then immediately delete the message= .

<= /span>

I need to insert some millions records in seconds = in Cassandra. Using one client with asyncExecute with folllowing configs:
maxConnectionsPerHost =3D 5
maxRequestsPerHost = =3D 32K
maxAsyncQueue at client side =3D 100K
<= br>
I could achieve  25% of throughtput i needed, client CPU= is more than 80% and increasing number of threads cause some execAsync to = fail, so configs above are the best the client could handle. Cassandra node= s cpu is less than 30% in average. The data has no locality in sake of part= ition keys and i can't use createSStable mechanism. Is there any tuning whi= ch i'm missing in client side, cause the server side is already tuned with = datastax recomendations.

Sent using Zoho Mail
<= /p>



<= /div>

------=_Part_106478_957788003.1521368220306--