Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1B3D818180 for ; Sun, 6 Dec 2015 16:40:19 +0000 (UTC) Received: (qmail 40246 invoked by uid 500); 6 Dec 2015 16:40:16 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 40197 invoked by uid 500); 6 Dec 2015 16:40:16 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 40187 invoked by uid 99); 6 Dec 2015 16:40:16 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 06 Dec 2015 16:40:16 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 59167180047 for ; Sun, 6 Dec 2015 16:40:16 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.98 X-Spam-Level: ** X-Spam-Status: No, score=2.98 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=3, MIME_QP_LONG_LINE=0.001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=vast-com.20150623.gappssmtp.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 0ftwKGAgD2ZV for ; Sun, 6 Dec 2015 16:40:14 +0000 (UTC) Received: from mail-ob0-f174.google.com (mail-ob0-f174.google.com [209.85.214.174]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 1B9FC2059B for ; Sun, 6 Dec 2015 16:40:14 +0000 (UTC) Received: by obcse5 with SMTP id se5so98143935obc.3 for ; Sun, 06 Dec 2015 08:40:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vast-com.20150623.gappssmtp.com; s=20150623; h=from:content-type:content-transfer-encoding:mime-version:subject :message-id:date:references:in-reply-to:to; bh=uVhZgVMK9LJHDY4lbt33MhdwNMx7n1ZTAmUuCBPfcho=; b=ZBJFyE4oY8d5wn9sVUkCYefVf3RgoakMkEMPL+VqsPkVu3I18rGKRBkB8KVMN93yfY uWgk5kl5cMs3ncS6HcXgX4E78azv36eUbxQPZvtHu37JdBuVSfQRVSJ/wJMmsScWeAvJ cd8mkVauO00nODTBRJ8d08oShj9mIGIRaX3fwxRaNG8JYG1nrMUulSvYKsJbYsOUHvXg a+SdYnxbrTd2mCS5GQsgd8y6oDM7L0CarbdNuupp0TSZ6HJYDI6Akv2S7elXXL0cMKVX Rofp+x6Ziz6siGXqqVbfATLJpEvlv+98u/HJSDRVxenMk9n/I73D554D6T+CooDF5Zlh BGTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:content-type:content-transfer-encoding :mime-version:subject:message-id:date:references:in-reply-to:to; bh=uVhZgVMK9LJHDY4lbt33MhdwNMx7n1ZTAmUuCBPfcho=; b=AijLZbbycKaBP2Yaarf8HlZdCQlNVpvDEMdU9j1V45VbeVUOGNA1/5EA6VTDg59Z78 hDfimhGikbW7qqwPGXv6jWxvG3C4ZlL3F3uOkCxT9wzzRYaN71Uq3MidXRt4dPNhOUcO 1H6oMaESIj+DOYHOau3QEZLPVij1La9i0s5Dp0BHnH6Vy2bE1t/Oscy+U5LLWJQsoumc jC06HpQqWM2t5vQx9ygOT6L2qYRAKWcl74Te1uZj+Ip5C7KwcHO8Ynh3mJeexjAhN11o +os5Eh5UNk6podQSgBl9Bq9OJ0qYVcfKLgowIiLVHSdd6ES/L3MBeBgry9oj6pQQEE85 7RMA== X-Gm-Message-State: ALoCoQlDjY7tUJ6cZh8Ed/jYozsJAMVlImhWkwDapDvG+ufs8kOS8LaQFvKX1DIOczb7IBeeWa2J X-Received: by 10.60.137.137 with SMTP id qi9mr20063305oeb.56.1449420006872; Sun, 06 Dec 2015 08:40:06 -0800 (PST) Received: from [192.168.1.67] (cpe-70-113-52-246.austin.res.rr.com. [70.113.52.246]) by smtp.gmail.com with ESMTPSA id t84sm9998957oie.1.2015.12.06.08.40.05 for (version=TLSv1/SSLv3 cipher=OTHER); Sun, 06 Dec 2015 08:40:05 -0800 (PST) From: Graham Sanderson Content-Type: multipart/alternative; boundary=Apple-Mail-FBAE5B64-0CB9-4519-BAA4-17B5DE723F16 Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (1.0) Subject: Re: Cassandra Tuning Issue Message-Id: Date: Sun, 6 Dec 2015 10:40:06 -0600 References: <977A042F-D7BB-4C89-8D85-B075BF4EF29B@126.com> In-Reply-To: To: user@cassandra.apache.org X-Mailer: iPhone Mail (13B143) --Apple-Mail-FBAE5B64-0CB9-4519-BAA4-17B5DE723F16 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable What version of C* are you using; what JVM version - you showed a partial GC= config but if that is still CMS (not G1) then you are going to have insane G= C pauses...=20 Depending on C* versions are you using on/off heap memtables and what type Those are the sorts of issues related to fat nodes; I'd be worried about - w= e run very nicely at 20G total heap and 8G new - the rest of our 128G memory= is disk cache/mmap and all of the off heap stuff so it doesn't go to waste That said I think Jack is probably on the right path with overloaded coordin= ators- though you'd still expect to see CPU usage unless your timeouts are t= oo low for the load, In which case the coordinator would be getting no respo= nses in time and quite possibly the other nodes are just dropping the mutati= ons (since they don't get to them before they know the coordinator would hav= e timed out) - I forget the command to check dropped mutations off the top o= f my head but you can see it in opcenter If you have GC problems you certainly Expect to see GC cpu usage but depending on how long you run your tests it m= ight take you a little while to run thru 40G I'm personally not a fan off >32G (ish) heaps as you can't do compressed oop= s and also it is unrealistic for CMS ... The word is that G1 is now working o= k with C* especially on newer C* and JDK versions, but that said it takes qu= ite a lot of thru-put to require insane quantities of young gen... We are gu= essing that when we remove all our legacy thrift batch inserts we will need l= ess - and as for 20G total we actually don't need that much (we dropped from= 24 when we moved memtables off heap, and believe we can drop further) Sent from my iPhone > On Dec 6, 2015, at 9:07 AM, Jack Krupansky wrot= e: >=20 > What replication factor are you using? Even if your writes use CL.ONE, Cas= sandra will be attempting writes to the replica nodes in the background. >=20 > Are your writes "token aware"? If not, the receiving node has the overhead= of forwarding the request to the node that owns the token for the primary k= ey. >=20 > For the record, Cassandra is not designed and optimized for so-called "fat= nodes". The design focus is "commodity hardware" and "distributed cluster" (= typically a dozen or more nodes.) >=20 > That said, it would be good if we had a rule of thumb for how many simulta= neous requests a node can handle, both external requests and inter-node traf= fic. I think there is an open Jira to enforce a limit on inflight requests s= o that nodes don't overloaded and start failing in the middle of writes as y= ou seem to be seeing. >=20 > -- Jack Krupansky >=20 >> On Sun, Dec 6, 2015 at 9:29 AM, jerry wrote: >> Dear All, >>=20 >> Now I have a 4 nodes Cassandra cluster, and I want to know the highes= t performance of my Cassandra cluster. I write a JAVA client to batch insert= datas into ALL 4 nodes Cassandra, when I start less than 30 subthreads in m= y client applications to insert datas into cassandra, it will be ok for ever= ything, but when I start more than 80 or 100 subthreads in my client applica= tions, there will be too much timeout Exceptions (Such as: Cassandra timeout= during write query at consistency ONE (1 replica were required but only 0 a= cknowledged the write)). And no matter how many subthreads or even I start m= ultiple clients with multiple subthreads on different computers, I can get t= he highest performance for about 60000 - 80000 TPS. By the way, each row I i= nsert into cassandra is about 130 Bytes. >> My 4 nodes of Cassandra is : >> CPU: 4*15 >> Memory: 512G >> Disk: flash card (only one disk but better than SSD) >> My cassandra configurations are: >> MAX_HEAP_SIZE: 60G >> NEW_HEAP_SIZE: 40G >>=20 >> When I insert datas into my cassandra cluster, each nodes has NOT rea= ched bottleneck such as CPU or Memory or Disk. Each of the three main hardwa= res is idle=E3=80=82So I think maybe there is something wrong about my confi= guration of cassandra cluster. Can somebody please help me to My Cassandra T= uning? Thanks in advances! >=20 --Apple-Mail-FBAE5B64-0CB9-4519-BAA4-17B5DE723F16 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable
What version of C* are you using; what= JVM version - you showed a partial GC config but if that is still CMS (not G= 1) then you are going to have insane GC pauses... 

Depending on C* ver= sions are you using on/off heap memtables and what type

Those are the sorts= of issues related to fat nodes; I'd be worried about - we run very nicely a= t 20G total heap and 8G new - the rest of our 128G memory is disk cache/mmap= and all of the off heap stuff so it doesn't go to waste

That said I think J= ack is probably on the right path with overloaded coordinators- though you'd= still expect to see CPU usage unless your timeouts are too low for the load= , In which case the coordinator would be getting no responses in time and qu= ite possibly the other nodes are just dropping the mutations (since they don= 't get to them before they know the coordinator would have timed out) - I fo= rget the command to check dropped mutations off the top of my head but you c= an see it in opcenter

If you have GC problems you certainly
Expect to see GC cpu usage but depending on how long you r= un your tests it might take you a little while to run thru 40G

I'm personally= not a fan off >32G (ish) heaps as you can't do compressed oops and also i= t is unrealistic for CMS ... The word is that G1 is now working ok with C* e= specially on newer C* and JDK versions, but that said it takes quite a lot o= f thru-put to require insane quantities of young gen... We are guessing that= when we remove all our legacy thrift batch inserts we will need less - and a= s for 20G total we actually don't need that much (we dropped from 24 when we= moved memtables off heap, and believe we can drop further)

Sent from my iPhone

On Dec 6, 2015, at= 9:07 AM, Jack Krupansky <jac= k.krupansky@gmail.com> wrote:

=
What replication factor are you using? Even if your wr= ites use CL.ONE, Cassandra will be attempting writes to the replica nodes in= the background.

Are your writes "token aware"? If not, t= he receiving node has the overhead of forwarding the request to the node tha= t owns the token for the primary key.

For the recor= d, Cassandra is not designed and optimized for so-called "fat nodes". The de= sign focus is "commodity hardware" and "distributed cluster" (typically a do= zen or more nodes.)

That said, it would be good if w= e had a rule of thumb for how many simultaneous requests a node can handle, b= oth external requests and inter-node traffic. I think there is an open Jira t= o enforce a limit on inflight requests so that nodes don't overloaded and st= art failing in the middle of writes as you seem to be seeing.

-- Jack Krupansky

On Sun, Dec 6, 2015 at 9:29 AM, jerry <xutom2= 006@126.com> wrote:
Dear All,=

    Now I have a 4 nodes Cassandra cluster, and I want to know the= highest performance of my Cassandra cluster. I write a JAVA client to batch= insert datas into ALL 4 nodes Cassandra, when I start less than 30 subthrea= ds in my client applications to insert datas into cassandra, it will be ok f= or everything, but when I start more than 80 or 100 subthreads in my client a= pplications, there will be too much timeout Exceptions (Such as: Cassandra t= imeout during write query at consistency ONE (1 replica were required but on= ly 0 acknowledged the write)). And no matter how many subthreads or even I s= tart multiple clients with multiple subthreads on different computers, I can= get the highest performance for about 60000 - 80000 TPS. By the way, each r= ow I insert into cassandra is about 130 Bytes.
    My 4 nodes of Cassandra is :
        CPU: 4*15
        Memory: 512G
        Disk: flash card (only one disk but better than S= SD)
    My cassandra configurations are:
        MAX_HEAP_SIZE: 60G
        NEW_HEAP_SIZE: 40G

    When I insert datas into my cassandra cluster, each nodes has N= OT reached bottleneck such as CPU or Memory or Disk. Each of the three main h= ardwares is idle=E3=80=82So I think maybe there is something wrong about my c= onfiguration of cassandra cluster. Can somebody please help me to My Cassand= ra Tuning? Thanks in advances!

= --Apple-Mail-FBAE5B64-0CB9-4519-BAA4-17B5DE723F16--