Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id CF207200C63 for ; Thu, 11 May 2017 19:57:17 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id CDB55160BC7; Thu, 11 May 2017 17:57:17 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id EBBF9160BB3 for ; Thu, 11 May 2017 19:57:16 +0200 (CEST) Received: (qmail 39836 invoked by uid 500); 11 May 2017 17:57:15 -0000 Mailing-List: contact dev-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@cassandra.apache.org Received: (qmail 39824 invoked by uid 99); 11 May 2017 17:57:15 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 May 2017 17:57:15 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 2351B1A958E for ; Thu, 11 May 2017 17:57:15 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.379 X-Spam-Level: X-Spam-Status: No, score=0.379 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id Ok8CriU5ZVMZ for ; Thu, 11 May 2017 17:57:13 +0000 (UTC) Received: from mail-wm0-f44.google.com (mail-wm0-f44.google.com [74.125.82.44]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 6F00B5FC8A for ; Thu, 11 May 2017 17:57:13 +0000 (UTC) Received: by mail-wm0-f44.google.com with SMTP id b84so50357816wmh.0 for ; Thu, 11 May 2017 10:57:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=FQmitbJiy4O1xoakifGB0pdhcionHUEhw0xMGmDwvlc=; b=rq4thdVH9GonzGX8vfFYfvr8jd864ksSRksyzh6gHupA5xLGrYZyKU+Dkfz1NIVhy0 xU+hzDFE7d7CSUd+2KCrBgvz7hodzI8MJrVwqyDZLgN9msZiiVPCXCiff43Tqb97mfV8 qlA5sIZbjiIsCASXcRjQbIuH6uKbvVvljg0V9S6gla2bprCirfZA+Q30pduKIrMVn4J6 nX5sTS60jdKWcksrM4BBtXx7rguMgdgVE7i2FpEeucBSrChp4dkWEYolcNDMJvwHIJi4 sXgDcgbie8Potfj/QygmPFRBEwMi/ZY2NLiWDl0FClU1x/n2XXBHfAjXXdN6n6jp+0N9 YnbA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=FQmitbJiy4O1xoakifGB0pdhcionHUEhw0xMGmDwvlc=; b=RunidKBan2oDXSLyqCAXaT5Gp4kuI8jS5AxG14lR/wrR4k25NCoj8Ndl9VsnuJLY/M ZIvtp2HkVmjfIrbP/Smsst14xKfkD6Kx4Xyw+fXlAv/2W7pq5GCl9kd2od1G/dEgup55 p2YL5akgmmDa5LJxSxg89xkD3l1hRJouNxnrT4cVRQjHfWnPmOqaipXTdCEAG/YUhm2u 31xJ0Qv5hqLJw2+6kblGTtxyjm7C5t7fIAlVz9HVj/GoXxWwAV0MVNmxl6BSGXaTAOej P5Uq24di/XusS0NTnVo452auRQDzXurjhoqvd8dz/9KpUi3Js5U86OfxNUPQzihv3dpv Ecug== X-Gm-Message-State: AODbwcCdsntf5LMlRSUbYr8G53CcdHoVyJTwkaAxHrYLCv2Aa544mdcR WxfmgBmnfv1AFw== X-Received: by 10.25.216.148 with SMTP id r20mr3406lfi.153.1494525432285; Thu, 11 May 2017 10:57:12 -0700 (PDT) Received: from [100.65.54.85] (host-95-199-150-85.mobileonline.telia.com. [95.199.150.85]) by smtp.gmail.com with ESMTPSA id u18sm152568lff.10.2017.05.11.10.57.11 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 11 May 2017 10:57:11 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (1.0) Subject: Re: Dropped Mutation and Read messages. From: Oskar Kjellin X-Mailer: iPhone Mail (14E304) In-Reply-To: <4ED290F6-D936-4081-81BF-7BDE0DEAF8CF@internalcircle.com> Date: Thu, 11 May 2017 19:57:10 +0200 Cc: varun saluja , "dev@cassandra.apache.org" Content-Transfer-Encoding: quoted-printable Message-Id: References: <192AAD1E-1FB7-4964-82DC-176FCA0CDE70@gmail.com> <93FAEC36-35F0-4BE5-A89F-66C872AC7957@gmail.com> <4ED290F6-D936-4081-81BF-7BDE0DEAF8CF@internalcircle.com> To: Michael Kjellman archived-at: Thu, 11 May 2017 17:57:18 -0000 Indeed, sorry. Subscribed to both so missed which one this was.=20 Sent from my iPhone > On 11 May 2017, at 19:56, Michael Kjellman w= rote: >=20 > This discussion should be on the C* user mailing list. Thanks! >=20 > best, > kjellman >=20 >> On May 11, 2017, at 10:53 AM, Oskar Kjellin wro= te: >>=20 >> That seems way too low. Depending on what type of disk you have it should= be closer to 1-200MB. >> That's probably causing your problems. It would still take a while for yo= u to compact all your data tho=20 >>=20 >> Sent from my iPhone >>=20 >>> On 11 May 2017, at 19:50, varun saluja wrote: >>>=20 >>> nodetool getcompactionthrougput >>>=20 >>> ./nodetool getcompactionthroughput >>> Current compaction throughput: 16 MB/s >>>=20 >>> Regards, >>> Varun Saluja >>>=20 >>>> On 11 May 2017 at 23:18, varun saluja wrote: >>>> Hi, >>>>=20 >>>> PFB results for same. Numbers are scary here. >>>>=20 >>>> [root@WA-CASSDB2 bin]# ./nodetool compactionstats >>>> pending tasks: 137 >>>> compaction type keyspace table completed = total unit progress >>>> Compaction system hints 5762711108 8= 37522028005 bytes 0.69% >>>> Compaction walletkeyspace user_txn_history_v2 101477894 = 4722068388 bytes 2.15% >>>> Compaction walletkeyspace user_txn_history_v2 1511866634 7= 53221762663 bytes 0.20% >>>> Compaction walletkeyspace user_txn_history_v2 3664734135 = 18605501268 bytes 19.70% >>>> Active compaction remaining time : 26h32m28s >>>>=20 >>>>=20 >>>>=20 >>>>> On 11 May 2017 at 23:15, Oskar Kjellin wrote= : >>>>> What does nodetool compactionstats show? >>>>>=20 >>>>> I meant compaction throttling. nodetool getcompactionthrougput >>>>>=20 >>>>>=20 >>>>>> On 11 May 2017, at 19:41, varun saluja wrote: >>>>>>=20 >>>>>> Hi Oskar, >>>>>>=20 >>>>>> Thanks for response. >>>>>>=20 >>>>>> Yes, could see lot of threads for compaction. Actually we are loading= around 400GB data per node on 3 node cassandra cluster. >>>>>> Throttling was set to write around 7k TPS per node. Job ran fine for 2= days and then, we start getting Mutation drops , longer GC and very high l= oad on system. >>>>>>=20 >>>>>> System log reports: >>>>>> Enqueuing flush of compactions_in_progress: 1156 (0%) on-heap, 1132 (= 0%) off-heap >>>>>>=20 >>>>>> The job was stopped 12 hours back. But, still these failures can be s= een. Can you Please let me know how shall i proceed further. If possible, Pl= ease suggest some parameters for high write intensive jobs. >>>>>>=20 >>>>>>=20 >>>>>> Regards, >>>>>> Varun Saluja >>>>>>=20 >>>>>>=20 >>>>>>> On 11 May 2017 at 23:01, Oskar Kjellin wro= te: >>>>>>> Do you have a lot of compactions going on? It sounds like you might'= ve built up a huge backlog. Is your throttling configured properly? >>>>>>>=20 >>>>>>>> On 11 May 2017, at 18:50, varun saluja wrote: >>>>>>>>=20 >>>>>>>> Hi Experts, >>>>>>>>=20 >>>>>>>> Seeking your help on a production issue. We were running high writ= e intensive job on our 3 node cassandra cluster V 2.1.7. >>>>>>>>=20 >>>>>>>> TPS on nodes were high. Job ran for more than 2 days and thereafter= , loadavg on 1 of the node increased to very high number like loadavg : 29. >>>>>>>>=20 >>>>>>>> System log reports: >>>>>>>>=20 >>>>>>>> INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.j= ava:888 - 839 MUTATION messages dropped in last 5000ms >>>>>>>> INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.j= ava:888 - 2 READ messages dropped in last 5000ms >>>>>>>> INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.j= ava:888 - 1 REQUEST_RESPONSE messages dropped in last 5000ms >>>>>>>>=20 >>>>>>>> The job was stopped due to heavy load. But sill after 12 hours , we= can see mutation drops messages and sudden increase on avgload >>>>>>>>=20 >>>>>>>> Are these hintedhandoff mutations? Can we stop these. >>>>>>>> Strangely this behaviour is seen only on 2 nodes. Node 1 does not s= how any load or any such activity. >>>>>>>>=20 >>>>>>>> Due to heavy load and GC , there are intermittent gossip failures a= mong node. Can you someone Please help. >>>>>>>>=20 >>>>>>>> PS: Load job was stopped on cluster. Everything ran fine for few ho= urs and and Later issue started again like mutation messages drops. >>>>>>>>=20 >>>>>>>> Thanks and Regards, >>>>>>>> Varun Saluja >>>>>>>>=20 >>>>>>>> -------------------------------------------------------------------= -- >>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org >>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org >>>>>>>>=20 >>>>>>=20 >>>>=20 >>>=20 >=20 --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org For additional commands, e-mail: dev-help@cassandra.apache.org