From user-return-61911-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org Wed Aug 8 16:40:02 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id A085F180600 for ; Wed, 8 Aug 2018 16:40:01 +0200 (CEST) Received: (qmail 15932 invoked by uid 500); 8 Aug 2018 14:39:59 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 15922 invoked by uid 99); 8 Aug 2018 14:39:59 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Aug 2018 14:39:59 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 72D3D1A31C6 for ; Wed, 8 Aug 2018 14:39:59 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.869 X-Spam-Level: * X-Spam-Status: No, score=1.869 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, T_DKIMWL_WL_MED=-0.01] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id 00ZPiAXWgOs0 for ; Wed, 8 Aug 2018 14:39:57 +0000 (UTC) Received: from mail-qt0-f180.google.com (mail-qt0-f180.google.com [209.85.216.180]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 650AD5F402 for ; Wed, 8 Aug 2018 14:39:56 +0000 (UTC) Received: by mail-qt0-f180.google.com with SMTP id y5-v6so2618306qti.12 for ; Wed, 08 Aug 2018 07:39:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=Xs23MIbTlnJUDr8RSo/n3BUEOglw3NR9u9spky881Hc=; b=iFx3aM5pxxEZR7ljl49KwAMQNmTE1sslooCOUsZHLUiKlopewqsu0SyNjA8InfrVhJ VezeGKDPa/B91XFdvG5iXvU1vZFMTQRYbvbzkxuSQWbsAoXH98GdnHy31akSQjNQMBSl ldXoT3f9rl6XoKsRa7u9J0xxLEYI7hWs0ZPRNmgILPYlH8h/A3m6IC5AQi6ttiIJiZL7 qAB6QDTraQEhL9M3JGhn4LDDgJ+GLAiMZx5W6cHXhdjzxKefFU6t4N/2uz6AwmqlZxRf w6Of9y2HNOG65ZpNjHHtikbshv8MGff6xM2e53Lw44BK/X3mJfoB97FTOsNGrbqMA661 2/gw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=Xs23MIbTlnJUDr8RSo/n3BUEOglw3NR9u9spky881Hc=; b=nJYg8A0XjV9iBnMReEHf79EmAiL28IXnozpGwFpDKm0l/9KZ/r0WTPmTZVR2n3deSB yC5ItqYJ22D1reM5oVV/P6H1IrA1k7LxT80qA0T0W3dUStkXqrb1a+yrk93AGyTsLgJN DhuVNzzR/Hh8qZ110cGTkJeXbVKSaMfSldIwgDd2Kb1YilvmdM6EvH5klPFgFvePqtXt EnJBQqcfEv7Aw/a2qe/SsyMb1Q0wWpB/GW6iuagyAn4kJu9nyGe59OlisuWPD2IBywjz U8dKBeYbYt9U1uD3vRb0+gNefM+wqcNa2Ky3uwXR8hQopfq60X58EbUjGVHwlOKsYk0F NaRA== X-Gm-Message-State: AOUpUlGOWnlXf/EtE9tj90aoLiE2vYDP665yrrS1D9WS6CTCzSTxiwf4 c8UGJPw0GCkVfP79I/szpV2VseNBJuPzb8iGu+MDJ9FJ X-Google-Smtp-Source: AA+uWPyOwDlhUujoHhtVkA9STem0IzQckOTw1aDm9EHPpZRCXRgAma7s/o0AnuarmEruWxlPjKC21yHNmMkvOr3s+YY= X-Received: by 2002:ac8:3fda:: with SMTP id v26-v6mr3031373qtk.414.1533739195035; Wed, 08 Aug 2018 07:39:55 -0700 (PDT) MIME-Version: 1.0 References: <01D18021-2C70-48D6-8395-06A80A412E2D@gmail.com> In-Reply-To: From: Brian Spindler Date: Wed, 8 Aug 2018 10:39:43 -0400 Message-ID: Subject: Re: TWCS Compaction backed up To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary="000000000000e5f6170572ed7b60" --000000000000e5f6170572ed7b60 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Jeff/Jon et al, here is what I'm thinking to do to clean up, please lmk what you think. This is precisely my problem I believe: http://thelastpickle.com/blog/2017/12/14/should-you-use-incremental-repair.= html With this I have a lot of wasted space due to a bad incremental repair. So I am thinking to abandon incremental repairs by; - Set all repairedAt values to 0 on any/all *Data.db SSTables - using either range_repair.py or reaper run sub range repairs Will this clean everything up? On Tue, Aug 7, 2018 at 9:18 PM Brian Spindler wrote: > In fact all of them say Repaired at: 0. > > On Tue, Aug 7, 2018 at 9:13 PM Brian Spindler > wrote: > >> Hi, I spot checked a couple of the files that were ~200MB and the mostly >> had "Repaired at: 0" so maybe that's not it? >> >> -B >> >> >> On Tue, Aug 7, 2018 at 8:16 PM wrote: >> >>> Everything is ttl=E2=80=99d >>> >>> I suppose I could use sstablemeta to see the repaired bit, could I just >>> set that to unrepaired somehow and that would fix? >>> >>> Thanks! >>> >>> On Aug 7, 2018, at 8:12 PM, Jeff Jirsa wrote: >>> >>> May be worth seeing if any of the sstables got promoted to repaired - i= f >>> so they=E2=80=99re not eligible for compaction with unrepaired sstables= and that >>> could explain some higher counts >>> >>> Do you actually do deletes or is everything ttl=E2=80=99d? >>> >>> >>> -- >>> Jeff Jirsa >>> >>> >>> On Aug 7, 2018, at 5:09 PM, Brian Spindler >>> wrote: >>> >>> Hi Jeff, mostly lots of little files, like there will be 4-5 that are >>> 1-1.5gb or so and then many at 5-50MB and many at 40-50MB each. >>> >>> Re incremental repair; Yes one of my engineers started an incremental >>> repair on this column family that we had to abort. In fact, the node t= hat >>> the repair was initiated on ran out of disk space and we ended replacin= g >>> that node like a dead node. >>> >>> Oddly the new node is experiencing this issue as well. >>> >>> -B >>> >>> >>> On Tue, Aug 7, 2018 at 8:04 PM Jeff Jirsa wrote: >>> >>>> You could toggle off the tombstone compaction to see if that helps, bu= t >>>> that should be lower priority than normal compactions >>>> >>>> Are the lots-of-little-files from memtable flushes or >>>> repair/anticompaction? >>>> >>>> Do you do normal deletes? Did you try to run Incremental repair? >>>> >>>> -- >>>> Jeff Jirsa >>>> >>>> >>>> On Aug 7, 2018, at 5:00 PM, Brian Spindler >>>> wrote: >>>> >>>> Hi Jonathan, both I believe. >>>> >>>> The window size is 1 day, full settings: >>>> AND compaction =3D {'timestamp_resolution': 'MILLISECONDS', >>>> 'unchecked_tombstone_compaction': 'true', 'compaction_window_size': '1= ', >>>> 'compaction_window_unit': 'DAYS', 'tombstone_compaction_interval': '86= 400', >>>> 'tombstone_threshold': '0.2', 'class': >>>> 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'} >>>> >>>> >>>> nodetool tpstats >>>> >>>> Pool Name Active Pending Completed Blocked >>>> All time blocked >>>> MutationStage 0 0 68582241832 0 >>>> 0 >>>> ReadStage 0 0 209566303 0 >>>> 0 >>>> RequestResponseStage 0 0 44680860850 0 >>>> 0 >>>> ReadRepairStage 0 0 24562722 0 >>>> 0 >>>> CounterMutationStage 0 0 0 0 >>>> 0 >>>> MiscStage 0 0 0 0 >>>> 0 >>>> HintedHandoff 1 1 203 0 >>>> 0 >>>> GossipStage 0 0 8471784 0 >>>> 0 >>>> CacheCleanupExecutor 0 0 122 0 >>>> 0 >>>> InternalResponseStage 0 0 552125 0 >>>> 0 >>>> CommitLogArchiver 0 0 0 0 >>>> 0 >>>> CompactionExecutor 8 42 1433715 0 >>>> 0 >>>> ValidationExecutor 0 0 2521 0 >>>> 0 >>>> MigrationStage 0 0 527549 0 >>>> 0 >>>> AntiEntropyStage 0 0 7697 0 >>>> 0 >>>> PendingRangeCalculator 0 0 17 0 >>>> 0 >>>> Sampler 0 0 0 0 >>>> 0 >>>> MemtableFlushWriter 0 0 116966 0 >>>> 0 >>>> MemtablePostFlush 0 0 209103 0 >>>> 0 >>>> MemtableReclaimMemory 0 0 116966 0 >>>> 0 >>>> Native-Transport-Requests 1 0 1715937778 0 >>>> 176262 >>>> >>>> Message type Dropped >>>> READ 2 >>>> RANGE_SLICE 0 >>>> _TRACE 0 >>>> MUTATION 4390 >>>> COUNTER_MUTATION 0 >>>> BINARY 0 >>>> REQUEST_RESPONSE 1882 >>>> PAGED_RANGE 0 >>>> READ_REPAIR 0 >>>> >>>> >>>> On Tue, Aug 7, 2018 at 7:57 PM Jonathan Haddad >>>> wrote: >>>> >>>>> What's your window size? >>>>> >>>>> When you say backed up, how are you measuring that? Are there pendin= g >>>>> tasks or do you just see more files than you expect? >>>>> >>>>> On Tue, Aug 7, 2018 at 4:38 PM Brian Spindler < >>>>> brian.spindler@gmail.com> wrote: >>>>> >>>>>> Hey guys, quick question: >>>>>> >>>>>> I've got a v2.1 cassandra cluster, 12 nodes on aws i3.2xl, commit lo= g >>>>>> on one drive, data on nvme. That was working very well, it's a ts d= b and >>>>>> has been accumulating data for about 4weeks. >>>>>> >>>>>> The nodes have increased in load and compaction seems to be falling >>>>>> behind. I used to get about 1 file per day for this column family, = about >>>>>> ~30GB Data.db file per day. I am now getting hundreds per day at 1= mb - >>>>>> 50mb. >>>>>> >>>>>> How to recover from this? >>>>>> >>>>>> I can scale out to give some breathing room but will it go back and >>>>>> compact the old days into nicely packed files for the day? >>>>>> >>>>>> I tried setting compaction throughput to 1000 from 256 and it seemed >>>>>> to make things worse for the CPU, it's configured on i3.2xl with 8 >>>>>> compaction threads. >>>>>> >>>>>> -B >>>>>> >>>>>> Lastly, I have mixed TTLs in this CF and need to run a repair (I >>>>>> think) to get rid of old tombstones, however running repairs in 2.1 = on TWCS >>>>>> column families causes a very large spike in sstable counts due to >>>>>> anti-compaction which causes a lot of disruption, is there any other= way? >>>>>> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> Jon Haddad >>>>> http://www.rustyrazorblade.com >>>>> twitter: rustyrazorblade >>>>> >>>> --000000000000e5f6170572ed7b60 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Jeff/Jon et al, here is what I'm thinking to do to = clean up, please lmk what you think.=C2=A0


With this I have a lot of wasted space due to a bad incremental repair.= =C2=A0 So I am thinking to abandon incremental repairs by;=C2=A0
= - Set all repairedAt values to 0 on any/all *Data.db SSTables
- u= sing either range_repair.py or reaper run sub range repairs

<= /div>
Will this clean everything up?=C2=A0=C2=A0
=C2=A0
=

On Tue, Aug 7, 2018 a= t 9:18 PM Brian Spindler <br= ian.spindler@gmail.com> wrote:
In fact all of them say Repaired at: 0.=C2=A0

=
On Tue, Aug 7, 2018 at 9:13 PM = Brian Spindler <brian.spindler@gmail.com> wrote:
Hi, I spot checked a couple of the files that w= ere ~200MB and the mostly had "Repaired at: 0" so maybe that'= s not it?=C2=A0

-B

=

On Tue, Aug 7, = 2018 at 8:16 PM <brian.spindler@gmail.com> wrote:
Everything is ttl=E2=80=99d=C2=A0

<= /div>
I suppose I could use sstablemeta to see the repaired bit, could = I just set that to unrepaired somehow and that would fix?=C2=A0

Thanks!

On Aug 7, 2018, a= t 8:12 PM, Jeff Jirsa <jjirsa@gmail.com> wrote:

May be worth seeing if any of the sstables got promoted to repaired -= if so they=E2=80=99re not eligible for compaction with unrepaired sstables= and that could explain some higher counts

Do you actual= ly do deletes or is everything ttl=E2=80=99d?
=C2=A0

--=C2=A0
Jeff Jirsa


O= n Aug 7, 2018, at 5:09 PM, Brian Spindler <brian.spindler@gmail.com> wrote:
Hi Jeff, mostly = lots of little files, like there will be 4-5 that are 1-1.5gb or so and the= n many at 5-50MB and many at 40-50MB each.=C2=A0 =C2=A0

= Re incremental repair; Yes one of my engineers started an incremental repai= r on this column family that we had to abort.=C2=A0 In fact, the node that = the repair was initiated on ran out of disk space and we ended replacing th= at node like a dead node.=C2=A0 =C2=A0

Oddly the n= ew node is experiencing this issue as well.=C2=A0=C2=A0

-B


On Tue, Aug 7, 2018 at 8:04 PM Jeff Jirsa <jjirsa@gmail.com> wrote:
=
You could toggle off the t= ombstone compaction to see if that helps, but that should be lower priority= than normal compactions

Are the lots-of-little-files fr= om memtable flushes or repair/anticompaction?

Do y= ou do normal deletes? Did you try to run Incremental repair? =C2=A0

--=C2=A0
Jeff Jirsa


On Aug 7, 2018, at 5:00 PM, = Brian Spindler <brian.spindler@gmail.com> wrote:

Hi Jonathan, both I believe.=C2=A0=C2= =A0

The window size is 1 day, full settings:=C2=A0=
=C2=A0 =C2=A0 AND compaction =3D {'timestamp_resolution'= : 'MILLISECONDS', 'unchecked_tombstone_compaction': 'tr= ue', 'compaction_window_size': '1', 'compaction_win= dow_unit': 'DAYS', 'tombstone_compaction_interval': = 9;86400', 'tombstone_threshold': '0.2', 'class'= : 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy&#= 39;}=C2=A0


nodetool tpstats=C2=A0

Pool Name=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 Active=C2=A0 =C2=A0Pending=C2=A0 =C2=A0 =C2=A0 = Completed=C2=A0 =C2=A0Blocked=C2=A0 All time blocked
MutationStage=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00=C2= =A0 =C2=A0 68582241832=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00
ReadSt= age=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2= =A0 209566303=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00
RequestResponseS= tage=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A00=C2=A0 =C2=A0 44680860850=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00
ReadRepairStage=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2= =A0 =C2=A024562722=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00
CounterMut= ationStage=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A00
MiscStage=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00
HintedHandoff=C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 2= 03=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A00
GossipStage=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 8471784=C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A00
CacheCleanupExecutor=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 122=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00
InternalResponseStage=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0552125=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00
CommitLogArchiver= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A00
CompactionExecutor=C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 8=C2=A0 =C2=A0 =C2=A0 =C2= =A0 42=C2=A0 =C2=A0 =C2=A0 =C2=A0 1433715=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= 0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00
ValidationExecutor=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A02521=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00
MigrationS= tage=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0527549= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A00
AntiEntropyStage=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A07697=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A00
PendingRangeCalculator=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A017=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00
Sampler=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00<= /div>
MemtableFlushWriter=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0116966=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00
Memtab= lePostFlush=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0209103= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A00
MemtableReclaimMemory=C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0116966=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00
Native-Transport-Requests=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A01715937778=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 176262

Message type=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0Dropped
READ=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A02
RANGE_SLICE=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 0
_TRACE=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00
MUTATION=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 4390
COUNTER_MUTATION=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A00
BINARY=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00
REQUEST_RESPONSE=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1882
PAGED_RANGE=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 0
READ_REPAIR=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0


On Tue, Aug 7, 2018 at 7:5= 7 PM Jonathan Haddad <jon@jonhaddad.com> wrote:
What's your window size?

Whe= n you say backed up, how are you measuring that?=C2=A0 Are there pending ta= sks or do you just see more files than you expect?

On Tue, Aug 7, 2018 at 4:38 PM Brian Spi= ndler <bri= an.spindler@gmail.com> wrote:
Hey guys, quick question:=C2=A0
=C2=A0
I&#= 39;ve got a v2.1 cassandra cluster, 12 nodes on aws i3.2xl, commit log on o= ne drive, data on nvme.=C2=A0 That was working very well, it's a ts db = and has been accumulating data for about 4weeks.=C2=A0=C2=A0

=
The nodes have increased in load and compaction seems to be fall= ing behind.=C2=A0 I used to get about 1 file per day for this column family= , about ~30GB Data.db file per day.=C2=A0 I am now getting hundreds per day= at=C2=A0 1mb - 50mb.

How to recover from this?=C2= =A0

I can scale out to give some breathing room bu= t will it go back and compact the old days into nicely packed files for the= day?=C2=A0 =C2=A0=C2=A0

I tried setting compactio= n throughput to 1000 from 256 and it seemed to make things worse for the CP= U, it's configured on i3.2xl with 8 compaction threads.=C2=A0

-B

Lastly, I have mixed TTLs in th= is CF and need to run a repair (I think) to get rid of old tombstones, howe= ver running repairs in 2.1 on TWCS column families causes a very large spik= e in sstable counts due to anti-compaction which causes a lot of disruption= , is there any other way?=C2=A0=C2=A0




--
Jon Haddad
http://www.rustyrazorblade.com<= br>twitter: rustyrazorblade
--000000000000e5f6170572ed7b60--