Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id E0A56200B3C for ; Wed, 13 Jul 2016 19:35:35 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id DF12A160A6A; Wed, 13 Jul 2016 17:35:35 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 1A334160A62 for ; Wed, 13 Jul 2016 19:35:33 +0200 (CEST) Received: (qmail 2062 invoked by uid 500); 13 Jul 2016 17:35:32 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 2052 invoked by uid 99); 13 Jul 2016 17:35:32 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Jul 2016 17:35:32 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id CC95FC04FD for ; Wed, 13 Jul 2016 17:35:31 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.279 X-Spam-Level: * X-Spam-Status: No, score=1.279 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=kryptoncloud-com.20150623.gappssmtp.com Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id yCuvTAecur0M for ; Wed, 13 Jul 2016 17:35:28 +0000 (UTC) Received: from mail-it0-f53.google.com (mail-it0-f53.google.com [209.85.214.53]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id 7D78561219 for ; Wed, 13 Jul 2016 17:35:27 +0000 (UTC) Received: by mail-it0-f53.google.com with SMTP id u186so27068540ita.0 for ; Wed, 13 Jul 2016 10:35:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kryptoncloud-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=9XGO5b+m/YB7H9gik+Rtg2A7N+MdXucTebWY1yHmT94=; b=skZkXReMQYLoTQICMyNU+3PeyH7/gAjzEI4erKWM5cTdQOP2o6r9rcrAn3PhU9hqE3 kvwp+oo22uY/+ucRxXADquP335bE1Q5jckBbu68z56F+KJwdgldspyFFoNbx6mnQlNfa zM8nNJDiXEad8W7CDfslnLvf00n1H8ek5JS02p/oEHuFD/hs+ci8eUz5a5akl1SBWOWr 53Y9Zl1fleoJAJxQuhMWgUX8ouDLx4Zlce/ekzvCJ4KdTScteK23SgL4iiXVRym9lNEu M+9OPLFNsfd3Cw5wkfLb/nqlWP3NzHXuizShDphaVSTDfSaTnZEPxS5jgn2FzQSmh60A yR5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=9XGO5b+m/YB7H9gik+Rtg2A7N+MdXucTebWY1yHmT94=; b=KGFfk6i5kEFeg9n9Csp1HEbrX4UMh6JTQKCxFXA/cM8uS1tEPyy8txc/+hE0ulpYpP KBTavbRsS6URgUEgURQjKdnmjKOmJZ6FPD7ePX3uatHili9/uIQHnG0OOXnCno+GPzO6 xmFzPWPzeywu6GTtRL1vgF62/AsdnrE8FNg6OnkMFVD2o839ojKOro97qY/AdgASyLg5 /CKixRdbZwjSWznxEmBgVa33D3Oq67Sk18/1o+DEohaOQD3af8D0SDma1eZfLEIo3Fu7 U8BUq2CeLog+2NYacdSM7J1u4k+Pkob6sX9MWx0BT9yK86qKdjTG68/t5LIn/qk/j9PA 6IdA== X-Gm-Message-State: ALyK8tKWyqhhgx/+ejL7NvWbL8eb1BED5DeqIe35lm7mlmPMaf18wdBYLwHCuLwJG+MKcfxcw2j583gmb69Slg== X-Received: by 10.36.154.65 with SMTP id l62mr9895298ite.79.1468431326070; Wed, 13 Jul 2016 10:35:26 -0700 (PDT) MIME-Version: 1.0 Received: by 10.50.203.34 with HTTP; Wed, 13 Jul 2016 10:34:46 -0700 (PDT) In-Reply-To: References: <16B8549C-E7DC-4C61-BE9F-66ADA289300B@crowdstrike.com> From: Yuan Fang Date: Wed, 13 Jul 2016 10:34:46 -0700 Message-ID: Subject: Re: Is my cluster normal? To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=94eb2c11485291630f053787cf06 archived-at: Wed, 13 Jul 2016 17:35:36 -0000 --94eb2c11485291630f053787cf06 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable In addition, it seems the compaction is very often. It happens like every couple of seconds and one after one. It seems causing high load. On Wed, Jul 13, 2016 at 10:32 AM, Yuan Fang wrote: > $nodetool tpstats > > ... > Pool Name Active Pending Completed > Blocked All time blocked > Native-Transport-Requests 128 128 1420623949 1 > 142821509 > ... > > > > What is this? Is it normal? > > On Tue, Jul 12, 2016 at 3:03 PM, Yuan Fang wrote: > >> Hi Jonathan, >> >> Here is the result: >> >> ubuntu@ip-172-31-44-250:~$ iostat -dmx 2 10 >> Linux 3.13.0-74-generic (ip-172-31-44-250) 07/12/2016 _x86_64_ (4 CPU) >> >> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s >> avgrq-sz avgqu-sz await r_await w_await svctm %util >> xvda 0.01 2.13 0.74 1.55 0.01 0.02 >> 27.77 0.00 0.74 0.89 0.66 0.43 0.10 >> xvdf 0.01 0.58 237.41 52.50 12.90 6.21 >> 135.02 2.32 8.01 3.65 27.72 0.57 16.63 >> >> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s >> avgrq-sz avgqu-sz await r_await w_await svctm %util >> xvda 0.00 7.50 0.00 2.50 0.00 0.04 >> 32.00 0.00 1.60 0.00 1.60 1.60 0.40 >> xvdf 0.00 0.00 353.50 0.00 24.12 0.00 >> 139.75 0.49 1.37 1.37 0.00 0.58 20.60 >> >> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s >> avgrq-sz avgqu-sz await r_await w_await svctm %util >> xvda 0.00 0.00 0.00 1.00 0.00 0.00 >> 8.00 0.00 0.00 0.00 0.00 0.00 0.00 >> xvdf 0.00 2.00 463.50 35.00 30.69 2.86 >> 137.84 0.88 1.77 1.29 8.17 0.60 30.00 >> >> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s >> avgrq-sz avgqu-sz await r_await w_await svctm %util >> xvda 0.00 0.00 0.00 1.00 0.00 0.00 >> 8.00 0.00 0.00 0.00 0.00 0.00 0.00 >> xvdf 0.00 0.00 99.50 36.00 8.54 4.40 >> 195.62 1.55 3.88 1.45 10.61 1.06 14.40 >> >> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s >> avgrq-sz avgqu-sz await r_await w_await svctm %util >> xvda 0.00 5.00 0.00 1.50 0.00 0.03 >> 34.67 0.00 1.33 0.00 1.33 1.33 0.20 >> xvdf 0.00 1.50 703.00 195.00 48.83 23.76 >> 165.57 6.49 8.36 1.66 32.51 0.55 49.80 >> >> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s >> avgrq-sz avgqu-sz await r_await w_await svctm %util >> xvda 0.00 0.00 0.00 1.00 0.00 0.04 >> 72.00 0.00 0.00 0.00 0.00 0.00 0.00 >> xvdf 0.00 2.50 149.50 69.50 10.12 6.68 >> 157.14 0.74 3.42 1.18 8.23 0.51 11.20 >> >> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s >> avgrq-sz avgqu-sz await r_await w_await svctm %util >> xvda 0.00 5.00 0.00 2.50 0.00 0.03 >> 24.00 0.00 0.00 0.00 0.00 0.00 0.00 >> xvdf 0.00 0.00 61.50 22.50 5.36 2.75 >> 197.64 0.33 3.93 1.50 10.58 0.88 7.40 >> >> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s >> avgrq-sz avgqu-sz await r_await w_await svctm %util >> xvda 0.00 0.00 0.00 0.50 0.00 0.00 >> 8.00 0.00 0.00 0.00 0.00 0.00 0.00 >> xvdf 0.00 0.00 375.00 0.00 24.84 0.00 >> 135.64 0.45 1.20 1.20 0.00 0.57 21.20 >> >> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s >> avgrq-sz avgqu-sz await r_await w_await svctm %util >> xvda 0.00 1.00 0.00 6.00 0.00 0.03 >> 9.33 0.00 0.00 0.00 0.00 0.00 0.00 >> xvdf 0.00 0.00 542.50 23.50 35.08 2.83 >> 137.16 0.80 1.41 1.15 7.23 0.49 28.00 >> >> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s >> avgrq-sz avgqu-sz await r_await w_await svctm %util >> xvda 0.00 3.50 0.50 1.50 0.00 0.02 >> 24.00 0.00 0.00 0.00 0.00 0.00 0.00 >> xvdf 0.00 1.50 272.00 153.50 16.18 18.67 >> 167.73 14.32 33.66 1.39 90.84 0.81 34.60 >> >> >> >> On Tue, Jul 12, 2016 at 12:34 PM, Jonathan Haddad >> wrote: >> >>> When you have high system load it means your CPU is waiting for >>> *something*, and in my experience it's usually slow disk. A disk conne= cted >>> over network has been a culprit for me many times. >>> >>> On Tue, Jul 12, 2016 at 12:33 PM Jonathan Haddad >>> wrote: >>> >>>> Can do you do: >>>> >>>> iostat -dmx 2 10 >>>> >>>> >>>> >>>> On Tue, Jul 12, 2016 at 11:20 AM Yuan Fang >>>> wrote: >>>> >>>>> Hi Jeff, >>>>> >>>>> The read being low is because we do not have much read operations >>>>> right now. >>>>> >>>>> The heap is only 4GB. >>>>> >>>>> MAX_HEAP_SIZE=3D4GB >>>>> >>>>> On Thu, Jul 7, 2016 at 7:17 PM, Jeff Jirsa >>>> > wrote: >>>>> >>>>>> EBS iops scale with volume size. >>>>>> >>>>>> >>>>>> >>>>>> A 600G EBS volume only guarantees 1800 iops =E2=80=93 if you=E2=80= =99re exhausting >>>>>> those on writes, you=E2=80=99re going to suffer on reads. >>>>>> >>>>>> >>>>>> >>>>>> You have a 16G server, and probably a good chunk of that allocated t= o >>>>>> heap. Consequently, you have almost no page cache, so your reads are= going >>>>>> to hit the disk. Your reads being very low is not uncommon if you ha= ve no >>>>>> page cache =E2=80=93 the default settings for Cassandra (64k compres= sion chunks) >>>>>> are really inefficient for small reads served off of disk. If you dr= op the >>>>>> compression chunk size (4k, for example), you=E2=80=99ll probably se= e your read >>>>>> throughput increase significantly, which will give you more iops for >>>>>> commitlog, so write throughput likely goes up, too. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> *From: *Jonathan Haddad >>>>>> *Reply-To: *"user@cassandra.apache.org" >>>>>> *Date: *Thursday, July 7, 2016 at 6:54 PM >>>>>> *To: *"user@cassandra.apache.org" >>>>>> *Subject: *Re: Is my cluster normal? >>>>>> >>>>>> >>>>>> >>>>>> What's your CPU looking like? If it's low, check your IO with iostat >>>>>> or dstat. I know some people have used Ebs and say it's fine but ive= been >>>>>> burned too many times. >>>>>> >>>>>> On Thu, Jul 7, 2016 at 6:12 PM Yuan Fang >>>>>> wrote: >>>>>> >>>>>> Hi Riccardo, >>>>>> >>>>>> >>>>>> >>>>>> Very low IO-wait. About 0.3%. >>>>>> >>>>>> No stolen CPU. It is a casssandra only instance. I did not see any >>>>>> dropped messages. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ubuntu@cassandra1:/mnt/data$ nodetool tpstats >>>>>> >>>>>> Pool Name Active Pending Completed >>>>>> Blocked All time blocked >>>>>> >>>>>> MutationStage 1 1 929509244 >>>>>> 0 0 >>>>>> >>>>>> ViewMutationStage 0 0 0 >>>>>> 0 0 >>>>>> >>>>>> ReadStage 4 0 4021570 >>>>>> 0 0 >>>>>> >>>>>> RequestResponseStage 0 0 731477999 >>>>>> 0 0 >>>>>> >>>>>> ReadRepairStage 0 0 165603 >>>>>> 0 0 >>>>>> >>>>>> CounterMutationStage 0 0 0 >>>>>> 0 0 >>>>>> >>>>>> MiscStage 0 0 0 >>>>>> 0 0 >>>>>> >>>>>> CompactionExecutor 2 55 92022 >>>>>> 0 0 >>>>>> >>>>>> MemtableReclaimMemory 0 0 1736 >>>>>> 0 0 >>>>>> >>>>>> PendingRangeCalculator 0 0 6 >>>>>> 0 0 >>>>>> >>>>>> GossipStage 0 0 345474 >>>>>> 0 0 >>>>>> >>>>>> SecondaryIndexManagement 0 0 0 >>>>>> 0 0 >>>>>> >>>>>> HintsDispatcher 0 0 4 >>>>>> 0 0 >>>>>> >>>>>> MigrationStage 0 0 35 >>>>>> 0 0 >>>>>> >>>>>> MemtablePostFlush 0 0 1973 >>>>>> 0 0 >>>>>> >>>>>> ValidationExecutor 0 0 0 >>>>>> 0 0 >>>>>> >>>>>> Sampler 0 0 0 >>>>>> 0 0 >>>>>> >>>>>> MemtableFlushWriter 0 0 1736 >>>>>> 0 0 >>>>>> >>>>>> InternalResponseStage 0 0 5311 >>>>>> 0 0 >>>>>> >>>>>> AntiEntropyStage 0 0 0 >>>>>> 0 0 >>>>>> >>>>>> CacheCleanupExecutor 0 0 0 >>>>>> 0 0 >>>>>> >>>>>> Native-Transport-Requests 128 128 347508531 >>>>>> 2 15891862 >>>>>> >>>>>> >>>>>> >>>>>> Message type Dropped >>>>>> >>>>>> READ 0 >>>>>> >>>>>> RANGE_SLICE 0 >>>>>> >>>>>> _TRACE 0 >>>>>> >>>>>> HINT 0 >>>>>> >>>>>> MUTATION 0 >>>>>> >>>>>> COUNTER_MUTATION 0 >>>>>> >>>>>> BATCH_STORE 0 >>>>>> >>>>>> BATCH_REMOVE 0 >>>>>> >>>>>> REQUEST_RESPONSE 0 >>>>>> >>>>>> PAGED_RANGE 0 >>>>>> >>>>>> READ_REPAIR 0 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Jul 7, 2016 at 5:24 PM, Riccardo Ferrari >>>>>> wrote: >>>>>> >>>>>> Hi Yuan, >>>>>> >>>>>> >>>>>> >>>>>> You machine instance is 4 vcpus that is 4 threads (not cores!!!), >>>>>> aside from any Cassandra specific discussion a system load of 10 on = a 4 >>>>>> threads machine is way too much in my opinion. If that is the runnin= g >>>>>> average system load I would look deeper into system details. Is that= IO >>>>>> wait? Is that CPU Stolen? Is that a Cassandra only instance or are t= here >>>>>> other processes pushing the load? >>>>>> >>>>>> What does your "nodetool tpstats" say? Hoe many dropped messages do >>>>>> you have? >>>>>> >>>>>> >>>>>> >>>>>> Best, >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Jul 8, 2016 at 12:34 AM, Yuan Fang >>>>>> wrote: >>>>>> >>>>>> Thanks Ben! For the post, it seems they got a little better but >>>>>> similar result than i did. Good to know it. >>>>>> >>>>>> I am not sure if a little fine tuning of heap memory will help or >>>>>> not. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Jul 7, 2016 at 2:58 PM, Ben Slater < >>>>>> ben.slater@instaclustr.com> wrote: >>>>>> >>>>>> Hi Yuan, >>>>>> >>>>>> >>>>>> >>>>>> You might find this blog post a useful comparison: >>>>>> >>>>>> >>>>>> https://www.instaclustr.com/blog/2016/01/07/multi-data-center-apache= -spark-and-apache-cassandra-benchmark/ >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Although the focus is on Spark and Cassandra and multi-DC there are >>>>>> also some single DC benchmarks of m4.xl >>>>>> >>>>>> clusters plus some discussion of how we went about benchmarking. >>>>>> >>>>>> >>>>>> >>>>>> Cheers >>>>>> >>>>>> Ben >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Fri, 8 Jul 2016 at 07:52 Yuan Fang wrote: >>>>>> >>>>>> Yes, here is my stress test result: >>>>>> >>>>>> Results: >>>>>> >>>>>> op rate : 12200 [WRITE:12200] >>>>>> >>>>>> partition rate : 12200 [WRITE:12200] >>>>>> >>>>>> row rate : 12200 [WRITE:12200] >>>>>> >>>>>> latency mean : 16.4 [WRITE:16.4] >>>>>> >>>>>> latency median : 7.1 [WRITE:7.1] >>>>>> >>>>>> latency 95th percentile : 38.1 [WRITE:38.1] >>>>>> >>>>>> latency 99th percentile : 204.3 [WRITE:204.3] >>>>>> >>>>>> latency 99.9th percentile : 465.9 [WRITE:465.9] >>>>>> >>>>>> latency max : 1408.4 [WRITE:1408.4] >>>>>> >>>>>> Total partitions : 1000000 [WRITE:1000000] >>>>>> >>>>>> Total errors : 0 [WRITE:0] >>>>>> >>>>>> total gc count : 0 >>>>>> >>>>>> total gc mb : 0 >>>>>> >>>>>> total gc time (s) : 0 >>>>>> >>>>>> avg gc time(ms) : NaN >>>>>> >>>>>> stdev gc time(ms) : 0 >>>>>> >>>>>> Total operation time : 00:01:21 >>>>>> >>>>>> END >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Jul 7, 2016 at 2:49 PM, Ryan Svihla wrote: >>>>>> >>>>>> Lots of variables you're leaving out. >>>>>> >>>>>> >>>>>> >>>>>> Depends on write size, if you're using logged batch or not, what >>>>>> consistency level, what RF, if the writes come in bursts, etc, etc. >>>>>> However, that's all sort of moot for determining "normal" really you= need a >>>>>> baseline as all those variables end up mattering a huge amount. >>>>>> >>>>>> >>>>>> >>>>>> I would suggest using Cassandra stress as a baseline and go from >>>>>> there depending on what those numbers say (just pick the defaults). >>>>>> >>>>>> Sent from my iPhone >>>>>> >>>>>> >>>>>> On Jul 7, 2016, at 4:39 PM, Yuan Fang wrote: >>>>>> >>>>>> yes, it is about 8k writes per node. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Jul 7, 2016 at 2:18 PM, daemeon reiydelle >>>>>> wrote: >>>>>> >>>>>> Are you saying 7k writes per node? or 30k writes per node? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> *.......Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 >>>>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872 >>>>>> <%28%2B44%29%20%280%29%2020%208144%209872>* >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Jul 7, 2016 at 2:05 PM, Yuan Fang >>>>>> wrote: >>>>>> >>>>>> writes 30k/second is the main thing. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Jul 7, 2016 at 1:51 PM, daemeon reiydelle >>>>>> wrote: >>>>>> >>>>>> Assuming you meant 100k, that likely for something with 16mb of >>>>>> storage (probably way small) where the data is more that 64k hence w= ill not >>>>>> fit into the row cache. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> *.......Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 >>>>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872 >>>>>> <%28%2B44%29%20%280%29%2020%208144%209872>* >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang >>>>>> wrote: >>>>>> >>>>>> >>>>>> >>>>>> I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory and >>>>>> 600GB ssd EBS). >>>>>> >>>>>> I can reach a cluster wide write requests of 30k/second and read >>>>>> request about 100/second. The cluster OS load constantly above 10. A= re >>>>>> those normal? >>>>>> >>>>>> >>>>>> >>>>>> Thanks! >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Best, >>>>>> >>>>>> >>>>>> >>>>>> Yuan >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2= =80=94 >>>>>> >>>>>> Ben Slater >>>>>> >>>>>> Chief Product Officer >>>>>> >>>>>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support >>>>>> >>>>>> +61 437 929 798 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >> > --94eb2c11485291630f053787cf06 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
In addition, it seems the compaction is very often. It hap= pens like every couple of seconds and one after one. It seems causing high = load.

On Wed= , Jul 13, 2016 at 10:32 AM, Yuan Fang <yuan@kryptoncloud.com> wrote:
$nodetool t= pstats=C2=A0

...
Pool Name = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Active =C2=A0 Pending =C2=A0 Completed =C2= =A0 Blocked =C2=A0 =C2=A0 =C2=A0All time blocked
Nativ= e-Transport-Requests =C2=A0 =C2=A0 =C2=A0 128 =C2=A0 =C2=A0 =C2=A0 128 =C2= =A0 =C2=A0 =C2=A0 =C2=A01420623949 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 142821509
...


=

What is this? Is it normal?

On Tue, J= ul 12, 2016 at 3:03 PM, Yuan Fang <yuan@kryptoncloud.com> wrote:
Hi Jonathan,

Here is the result:

ubuntu@ip-172-31-44-25= 0:~$ iostat -dmx 2 10
Linux 3.13.0-74-generic (ip-172-31-44-250) = 07/12/2016 _x86_64_ (4 CPU)

Device: =C2=A0 =C2=A0 =C2=A0 =C2=A0 rr= qm/s =C2=A0 wrqm/s =C2=A0 =C2=A0 r/s =C2=A0 =C2=A0 w/s =C2=A0 =C2=A0rMB/s = =C2=A0 =C2=A0wMB/s avgrq-sz avgqu-sz =C2=A0 await r_await w_await =C2=A0svc= tm =C2=A0%util
xvda =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A00.01 =C2=A0 =C2=A0 2.13 =C2=A0 =C2=A00.74 =C2=A0 =C2=A01.55 =C2=A0 =C2= =A0 0.01 =C2=A0 =C2=A0 0.02 =C2=A0 =C2=A027.77 =C2=A0 =C2=A0 0.00 =C2=A0 = =C2=A00.74 =C2=A0 =C2=A00.89 =C2=A0 =C2=A00.66 =C2=A0 0.43 =C2=A0 0.10
xvdf =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00.01 =C2=A0 =C2= =A0 0.58 =C2=A0237.41 =C2=A0 52.50 =C2=A0 =C2=A012.90 =C2=A0 =C2=A0 6.21 = =C2=A0 135.02 =C2=A0 =C2=A0 2.32 =C2=A0 =C2=A08.01 =C2=A0 =C2=A03.65 =C2=A0= 27.72 =C2=A0 0.57 =C2=A016.63

Device: =C2=A0 =C2= =A0 =C2=A0 =C2=A0 rrqm/s =C2=A0 wrqm/s =C2=A0 =C2=A0 r/s =C2=A0 =C2=A0 w/s = =C2=A0 =C2=A0rMB/s =C2=A0 =C2=A0wMB/s avgrq-sz avgqu-sz =C2=A0 await r_awai= t w_await =C2=A0svctm =C2=A0%util
xvda =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A00.00 =C2=A0 =C2=A0 7.50 =C2=A0 =C2=A00.00 =C2=A0 = =C2=A02.50 =C2=A0 =C2=A0 0.00 =C2=A0 =C2=A0 0.04 =C2=A0 =C2=A032.00 =C2=A0 = =C2=A0 0.00 =C2=A0 =C2=A01.60 =C2=A0 =C2=A00.00 =C2=A0 =C2=A01.60 =C2=A0 1.= 60 =C2=A0 0.40
xvdf =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A00.00 =C2=A0 =C2=A0 0.00 =C2=A0353.50 =C2=A0 =C2=A00.00 =C2=A0 =C2=A024.1= 2 =C2=A0 =C2=A0 0.00 =C2=A0 139.75 =C2=A0 =C2=A0 0.49 =C2=A0 =C2=A01.37 =C2= =A0 =C2=A01.37 =C2=A0 =C2=A00.00 =C2=A0 0.58 =C2=A020.60

Device: =C2=A0 =C2=A0 =C2=A0 =C2=A0 rrqm/s =C2=A0 wrqm/s =C2=A0 =C2= =A0 r/s =C2=A0 =C2=A0 w/s =C2=A0 =C2=A0rMB/s =C2=A0 =C2=A0wMB/s avgrq-sz av= gqu-sz =C2=A0 await r_await w_await =C2=A0svctm =C2=A0%util
xvda = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00.00 =C2=A0 =C2=A0 0.00 =C2= =A0 =C2=A00.00 =C2=A0 =C2=A01.00 =C2=A0 =C2=A0 0.00 =C2=A0 =C2=A0 0.00 =C2= =A0 =C2=A0 8.00 =C2=A0 =C2=A0 0.00 =C2=A0 =C2=A00.00 =C2=A0 =C2=A00.00 =C2= =A0 =C2=A00.00 =C2=A0 0.00 =C2=A0 0.00
xvdf =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A00.00 =C2=A0 =C2=A0 2.00 =C2=A0463.50 =C2=A0 35.0= 0 =C2=A0 =C2=A030.69 =C2=A0 =C2=A0 2.86 =C2=A0 137.84 =C2=A0 =C2=A0 0.88 = =C2=A0 =C2=A01.77 =C2=A0 =C2=A01.29 =C2=A0 =C2=A08.17 =C2=A0 0.60 =C2=A030.= 00

Device: =C2=A0 =C2=A0 =C2=A0 =C2=A0 rrqm/s =C2= =A0 wrqm/s =C2=A0 =C2=A0 r/s =C2=A0 =C2=A0 w/s =C2=A0 =C2=A0rMB/s =C2=A0 = =C2=A0wMB/s avgrq-sz avgqu-sz =C2=A0 await r_await w_await =C2=A0svctm =C2= =A0%util
xvda =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00.0= 0 =C2=A0 =C2=A0 0.00 =C2=A0 =C2=A00.00 =C2=A0 =C2=A01.00 =C2=A0 =C2=A0 0.00= =C2=A0 =C2=A0 0.00 =C2=A0 =C2=A0 8.00 =C2=A0 =C2=A0 0.00 =C2=A0 =C2=A00.00= =C2=A0 =C2=A00.00 =C2=A0 =C2=A00.00 =C2=A0 0.00 =C2=A0 0.00
xvdf= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00.00 =C2=A0 =C2=A0 0.00 = =C2=A0 99.50 =C2=A0 36.00 =C2=A0 =C2=A0 8.54 =C2=A0 =C2=A0 4.40 =C2=A0 195.= 62 =C2=A0 =C2=A0 1.55 =C2=A0 =C2=A03.88 =C2=A0 =C2=A01.45 =C2=A0 10.61 =C2= =A0 1.06 =C2=A014.40

Device: =C2=A0 =C2=A0 =C2=A0 = =C2=A0 rrqm/s =C2=A0 wrqm/s =C2=A0 =C2=A0 r/s =C2=A0 =C2=A0 w/s =C2=A0 =C2= =A0rMB/s =C2=A0 =C2=A0wMB/s avgrq-sz avgqu-sz =C2=A0 await r_await w_await = =C2=A0svctm =C2=A0%util
xvda =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A00.00 =C2=A0 =C2=A0 5.00 =C2=A0 =C2=A00.00 =C2=A0 =C2=A01.50 = =C2=A0 =C2=A0 0.00 =C2=A0 =C2=A0 0.03 =C2=A0 =C2=A034.67 =C2=A0 =C2=A0 0.00= =C2=A0 =C2=A01.33 =C2=A0 =C2=A00.00 =C2=A0 =C2=A01.33 =C2=A0 1.33 =C2=A0 0= .20
xvdf =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00.00 =C2= =A0 =C2=A0 1.50 =C2=A0703.00 =C2=A0195.00 =C2=A0 =C2=A048.83 =C2=A0 =C2=A02= 3.76 =C2=A0 165.57 =C2=A0 =C2=A0 6.49 =C2=A0 =C2=A08.36 =C2=A0 =C2=A01.66 = =C2=A0 32.51 =C2=A0 0.55 =C2=A049.80

Device: =C2= =A0 =C2=A0 =C2=A0 =C2=A0 rrqm/s =C2=A0 wrqm/s =C2=A0 =C2=A0 r/s =C2=A0 =C2= =A0 w/s =C2=A0 =C2=A0rMB/s =C2=A0 =C2=A0wMB/s avgrq-sz avgqu-sz =C2=A0 awai= t r_await w_await =C2=A0svctm =C2=A0%util
xvda =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00.00 =C2=A0 =C2=A0 0.00 =C2=A0 =C2=A00.00 = =C2=A0 =C2=A01.00 =C2=A0 =C2=A0 0.00 =C2=A0 =C2=A0 0.04 =C2=A0 =C2=A072.00 = =C2=A0 =C2=A0 0.00 =C2=A0 =C2=A00.00 =C2=A0 =C2=A00.00 =C2=A0 =C2=A00.00 = =C2=A0 0.00 =C2=A0 0.00
xvdf =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A00.00 =C2=A0 =C2=A0 2.50 =C2=A0149.50 =C2=A0 69.50 =C2=A0 =C2= =A010.12 =C2=A0 =C2=A0 6.68 =C2=A0 157.14 =C2=A0 =C2=A0 0.74 =C2=A0 =C2=A03= .42 =C2=A0 =C2=A01.18 =C2=A0 =C2=A08.23 =C2=A0 0.51 =C2=A011.20
<= br>
Device: =C2=A0 =C2=A0 =C2=A0 =C2=A0 rrqm/s =C2=A0 wrqm/s =C2= =A0 =C2=A0 r/s =C2=A0 =C2=A0 w/s =C2=A0 =C2=A0rMB/s =C2=A0 =C2=A0wMB/s avgr= q-sz avgqu-sz =C2=A0 await r_await w_await =C2=A0svctm =C2=A0%util
xvda =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00.00 =C2=A0 =C2=A0 5= .00 =C2=A0 =C2=A00.00 =C2=A0 =C2=A02.50 =C2=A0 =C2=A0 0.00 =C2=A0 =C2=A0 0.= 03 =C2=A0 =C2=A024.00 =C2=A0 =C2=A0 0.00 =C2=A0 =C2=A00.00 =C2=A0 =C2=A00.0= 0 =C2=A0 =C2=A00.00 =C2=A0 0.00 =C2=A0 0.00
xvdf =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00.00 =C2=A0 =C2=A0 0.00 =C2=A0 61.50 =C2= =A0 22.50 =C2=A0 =C2=A0 5.36 =C2=A0 =C2=A0 2.75 =C2=A0 197.64 =C2=A0 =C2=A0= 0.33 =C2=A0 =C2=A03.93 =C2=A0 =C2=A01.50 =C2=A0 10.58 =C2=A0 0.88 =C2=A0 7= .40

Device: =C2=A0 =C2=A0 =C2=A0 =C2=A0 rrqm/s =C2= =A0 wrqm/s =C2=A0 =C2=A0 r/s =C2=A0 =C2=A0 w/s =C2=A0 =C2=A0rMB/s =C2=A0 = =C2=A0wMB/s avgrq-sz avgqu-sz =C2=A0 await r_await w_await =C2=A0svctm =C2= =A0%util
xvda =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00.0= 0 =C2=A0 =C2=A0 0.00 =C2=A0 =C2=A00.00 =C2=A0 =C2=A00.50 =C2=A0 =C2=A0 0.00= =C2=A0 =C2=A0 0.00 =C2=A0 =C2=A0 8.00 =C2=A0 =C2=A0 0.00 =C2=A0 =C2=A00.00= =C2=A0 =C2=A00.00 =C2=A0 =C2=A00.00 =C2=A0 0.00 =C2=A0 0.00
xvdf= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00.00 =C2=A0 =C2=A0 0.00 = =C2=A0375.00 =C2=A0 =C2=A00.00 =C2=A0 =C2=A024.84 =C2=A0 =C2=A0 0.00 =C2=A0= 135.64 =C2=A0 =C2=A0 0.45 =C2=A0 =C2=A01.20 =C2=A0 =C2=A01.20 =C2=A0 =C2= =A00.00 =C2=A0 0.57 =C2=A021.20

Device: =C2=A0 =C2= =A0 =C2=A0 =C2=A0 rrqm/s =C2=A0 wrqm/s =C2=A0 =C2=A0 r/s =C2=A0 =C2=A0 w/s = =C2=A0 =C2=A0rMB/s =C2=A0 =C2=A0wMB/s avgrq-sz avgqu-sz =C2=A0 await r_awai= t w_await =C2=A0svctm =C2=A0%util
xvda =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A00.00 =C2=A0 =C2=A0 1.00 =C2=A0 =C2=A00.00 =C2=A0 = =C2=A06.00 =C2=A0 =C2=A0 0.00 =C2=A0 =C2=A0 0.03 =C2=A0 =C2=A0 9.33 =C2=A0 = =C2=A0 0.00 =C2=A0 =C2=A00.00 =C2=A0 =C2=A00.00 =C2=A0 =C2=A00.00 =C2=A0 0.= 00 =C2=A0 0.00
xvdf =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A00.00 =C2=A0 =C2=A0 0.00 =C2=A0542.50 =C2=A0 23.50 =C2=A0 =C2=A035.08 =C2= =A0 =C2=A0 2.83 =C2=A0 137.16 =C2=A0 =C2=A0 0.80 =C2=A0 =C2=A01.41 =C2=A0 = =C2=A01.15 =C2=A0 =C2=A07.23 =C2=A0 0.49 =C2=A028.00

Device: =C2=A0 =C2=A0 =C2=A0 =C2=A0 rrqm/s =C2=A0 wrqm/s =C2=A0 =C2=A0 r= /s =C2=A0 =C2=A0 w/s =C2=A0 =C2=A0rMB/s =C2=A0 =C2=A0wMB/s avgrq-sz avgqu-s= z =C2=A0 await r_await w_await =C2=A0svctm =C2=A0%util
xvda =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00.00 =C2=A0 =C2=A0 3.50 =C2=A0= =C2=A00.50 =C2=A0 =C2=A01.50 =C2=A0 =C2=A0 0.00 =C2=A0 =C2=A0 0.02 =C2=A0 = =C2=A024.00 =C2=A0 =C2=A0 0.00 =C2=A0 =C2=A00.00 =C2=A0 =C2=A00.00 =C2=A0 = =C2=A00.00 =C2=A0 0.00 =C2=A0 0.00
xvdf =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A00.00 =C2=A0 =C2=A0 1.50 =C2=A0272.00 =C2=A0153.50 = =C2=A0 =C2=A016.18 =C2=A0 =C2=A018.67 =C2=A0 167.73 =C2=A0 =C2=A014.32 =C2= =A0 33.66 =C2=A0 =C2=A01.39 =C2=A0 90.84 =C2=A0 0.81 =C2=A034.60


<= br>
On Tue, Jul 12, 2016 at 12:34 PM, Jonathan Ha= ddad <jon@jonhaddad.com> wrote:
When you have high system load it means your CPU is waiting for *s= omething*, and in my experience it's usually slow disk.=C2=A0 A disk co= nnected over network has been a culprit for me many times.
<= br>
On Tue, Jul 12, 2016 at 12:3= 3 PM Jonathan Haddad <jon@jonhaddad.com> wrote:
Can do you do:

iostat -dmx 2 10=C2=A0


On Tue, Jul 12, 2016 at 11:20 AM Yuan Fang <yuan@kryptoncloud.com> wrote:
Hi Jeff,

The read = being low is because we do not have much read operations right now.

The heap is only 4GB.

MAX_HEAP_S= IZE=3D4GB

On Thu, Jul 7, 2016 at 7:17 PM, Jeff Jirsa <jeff.jirsa@crowds= trike.com> wrote:

= EBS iops scale with volu= me size.

=C2=A0

A 600G EBS vo= lume only guarantees 1800 iops =E2=80=93 if you=E2=80=99re exhausting those= on writes, you=E2=80=99re going to suffer on reads.

=C2=A0

You have a 16G server, and probably a good c= hunk of that allocated to heap. Consequently, you have almost no page cache= , so your reads are going to hit the disk. Your reads being very low is not= uncommon if you have no page cache =E2=80=93 the default settings for Cass= andra (64k compression chunks) are really inefficient for small reads serve= d off of disk. If you drop the compression chunk size (4k, for example), yo= u=E2=80=99ll probably see your read throughput increase significantly, whic= h will give you more iops for commitlog, so write throughput likely goes up= , too.

=C2=A0

=C2=A0

=C2=A0

From: Jonathan Haddad <jon@jonhaddad.com>
Reply-To: "user@cassandra.apache.org&q= uot; <use= r@cassandra.apache.org>
Date: Thursday, July 7, 2016 at 6:= 54 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>=
Subject: Re: Is my cluster normal?

=C2=A0

<= div>

What's your CPU looking like? If it's lo= w, check your IO with iostat or dstat. I know some people have used Ebs and= say it's fine but ive been burned too many times.

On Thu, Jul 7, 2016 at 6:12 PM Yuan Fang <= ;yuan@kryptonclo= ud.com> wrote:

Hi Riccardo,

=C2=A0

Very low IO-w= ait. About 0.3%.

No stol= en CPU. It is a casssandra only instance. I did not see any dropped message= s.

=C2=A0<= /p>

=C2=A0

ubuntu@cassandra1:/mnt/data$ nodetool tpstats

Pool Name =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Active =C2=A0 Pending = =C2=A0 =C2=A0 =C2=A0Completed =C2=A0 Blocked =C2=A0All time blocked<= u>

MutationStage =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1 =C2=A0 =C2=A0 =C2=A0= =C2=A0 1 =C2=A0 =C2=A0 =C2=A0929509244 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0

ViewMutationStage =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0

ReadStage =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 4 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A04021570 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0

=

RequestResponseStage =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A00 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 = =C2=A0731477999 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 0

ReadRepairStage =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1656= 03 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 0

CounterM= utationStage =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 0

MiscStage =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A00 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 0

CompactionExecutor =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A02 =C2=A0 =C2=A0 =C2=A0 =C2=A055 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0920= 22 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 0

Memtable= ReclaimMemory =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1736 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0

PendingRangeCalculator =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A06 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0

=

GossipStage =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 345474 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0

SecondaryIndexManagement =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A00 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A00 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0

HintsDispatcher =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A04 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0

MigrationStage =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 35 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0

MemtablePostFlush =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 1973 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0

ValidationExecutor =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A00 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A00 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0

Sampler =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0

MemtableFlushWriter =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 1736 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0

InternalResponseStage =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 5311 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 0

A= ntiEntropyStage =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A00 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A00 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 0

CacheCleanupExecutor =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A00 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 0

Nati= ve-Transport-Requests =C2=A0 =C2=A0 =C2=A0 128 =C2=A0 =C2=A0 =C2=A0 128 =C2= =A0 =C2=A0 =C2=A0347508531 =C2=A0 =C2=A0 =C2=A0 =C2=A0 2 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A015891862

=C2=A0

Message type = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Dropped

READ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0

RANGE_SLICE =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A00

_TRACE = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 0

HINT =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0=

MUTATION =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0

COUNTER_MUTATION =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 0

BATCH_S= TORE =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00=

BATCH_REMOVE =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0

REQUEST_RESPONSE =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= 0

PAGED_RANGE =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00

<= /div>

READ_REPAIR =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00

=C2=A0

=C2=A0

=C2=A0

=C2=A0

=C2=A0

On Thu, Jul 7, 2016 at 5:24 PM, Riccardo Ferrari <ferrarir@gmail.com>= wrote:

= Hi Yuan,

=C2=A0=

You machine instance is 4 vcpus that = is 4 threads (not cores!!!), aside from any Cassandra specific discussion a= system load of 10 on a 4 threads machine is way too much in my opinion. If= that is the running average system load I would look deeper into system de= tails. Is that IO wait? Is that CPU Stolen? Is that a Cassandra only instan= ce or are there other processes pushing the load?

What does your "nodetool tpstats" say? = Hoe many dropped messages do you have?

=C2=A0

Be= st,

<= /u>=C2=A0

On Fri, Jul 8, 2016 at 12:3= 4 AM, Yuan Fang <yuan@kryptoncloud.com> wrote:

Thanks Ben! For the post, it seems they go= t a little better but similar result than i did. Good to know it.

I am not sure if a little fine tuning = of heap memory will help or not.=C2=A0

=C2=A0

= =C2=A0

On Thu, Jul 7, 2016 at 2:58 PM= , Ben Slater <ben.slater@instaclustr.com> wrote:

Hi Yuan,

=C2=A0

Although the focus is on Spark and Cassandra and multi-DC there are = also some single DC benchmarks of m4.xl clusters = plus some discussion of how we went about benchmarking.

=C2=A0

Cheers

Be= n

=C2=A0

=C2=A0

On Fri, 8 Jul 2016 at 07:52 Yuan Fang <yuan@kryptoncloud.co= m> wrote:

Yes, here is my stress test result:

<= div>

Results:

op rate =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 : 12200 [WRITE:12200]

partition rate =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: 12200 [WRIT= E:12200]

row rate =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: 12200 [WRITE:1220= 0]

latency mean =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: 16.4 [WRITE:16.4]<= /p>

latency median =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0: 7.1 [WRITE:7.1]

latency 95th percentile =C2=A0 : 38.1 [WRITE:38.1]=

latency 99th percentile =C2=A0 : = 204.3 [WRITE:204.3]

late= ncy 99.9th percentile : 465.9 [WRITE:465.9]

latency max =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 : 1408.4 [WRITE:1408.4]

Total partitions =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: 1000000 [WRITE:= 1000000]

Total errors = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: 0 [WRITE:0]=

total gc count =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0: 0

= total gc mb =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 : 0<= /u>

total gc time (s) =C2=A0 =C2=A0 = =C2=A0 =C2=A0 : 0

avg gc= time(ms) =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 : NaN

<= div>

stdev gc time(ms) =C2=A0 =C2=A0 =C2=A0 =C2=A0 : = 0

Total operation time = =C2=A0 =C2=A0 =C2=A0: 00:01:21

END

<= u>=C2=A0

On Thu, Jul 7, 2016 at 2= :49 PM, Ryan Svihla <rs@foundev.pro> wrote:

=

Lots of variables you're leaving out.

=C2=A0

Depends on write size, if you're using log= ged batch or not, what consistency level, what RF, if the writes come in bu= rsts, etc, etc. However, that's all sort of moot for determining "= normal" really you need a baseline as all those variables end up matte= ring a huge amount.

<= /u>=C2=A0

I would suggest using= Cassandra stress as a baseline and go from there depending on what those n= umbers say (just pick the defaults).

Sent from my iPhone


On Jul 7, 2016, at 4:39 PM, Yuan Fang <yuan@kryptoncloud.com> wrote:

<= div>

yes, it is about 8k writes per node.

=C2=A0

=

=C2=A0

=C2=A0

On Thu, Jul= 7, 2016 at 2:18 PM, daemeon reiydelle <daemeonr@gmail.com> wrote:

=

Are you saying 7k writes per node? or = 30k writes per node?



.......

Daemeon C.M. Reiydelle<= br>USA (+1) 415= .501.0198
London (+44) (0) 20 8144 9872
=

=C2=A0

On Thu, Jul 7, 2016= at 2:05 PM, Yuan Fang <yuan@kryptoncloud.com> wrote:

writes 30k/second is the main thin= g.

=C2=A0

=C2=A0

On Thu, Jul 7, 2016 at 1:51 PM, daemeon reiydelle = <daemeonr@gmail.= com> wrote:

As= suming you meant 100k, that likely for something with 16mb of storage (prob= ably way small) where the data is more that 64k hence will not fit into the= row cache.



.......

Daemeon C.M. Reiydelle
USA (+1) 415.501.0198=
London (+44) (0) 20 8144 9872

=C2=A0

On Thu, Jul 7, 2016 at 1:25 = PM, Yuan Fang <yuan@kryptoncloud.com> wrote:

= =C2=A0

I have a cluster of 4 m4.xlarg= e nodes(4 cpus and 16 gb memory and 600GB ssd EBS).

=

I can reach a cluster wide write requests of 30k/sec= ond and read request about 100/second. The cluster OS load constantly above= 10. Are those normal?

<= u>=C2=A0

Thanks!<= /u>

=C2=A0

=C2=A0

Best,

=C2= =A0

Yuan=C2=A0

=C2=A0

=C2=A0

=

=C2=A0

=C2=A0

= =C2=A0

=C2=A0

--

=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2= =80=94=E2=80=94=E2=80=94=E2=80=94

Ben Slater

Chief Product Officer

Instaclustr: Cassandra + Sp= ark - Managed | Consulting | Support

=C2=A0

=

=C2=A0

=C2=A0





--94eb2c11485291630f053787cf06--