cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Spindler <brian.spind...@gmail.com>
Subject Re: TWCS Compaction backed up
Date Wed, 08 Aug 2018 01:13:32 GMT
Hi, I spot checked a couple of the files that were ~200MB and the mostly
had "Repaired at: 0" so maybe that's not it?

-B


On Tue, Aug 7, 2018 at 8:16 PM <brian.spindler@gmail.com> wrote:

> Everything is ttl’d
>
> I suppose I could use sstablemeta to see the repaired bit, could I just
> set that to unrepaired somehow and that would fix?
>
> Thanks!
>
> On Aug 7, 2018, at 8:12 PM, Jeff Jirsa <jjirsa@gmail.com> wrote:
>
> May be worth seeing if any of the sstables got promoted to repaired - if
> so they’re not eligible for compaction with unrepaired sstables and that
> could explain some higher counts
>
> Do you actually do deletes or is everything ttl’d?
>
>
> --
> Jeff Jirsa
>
>
> On Aug 7, 2018, at 5:09 PM, Brian Spindler <brian.spindler@gmail.com>
> wrote:
>
> Hi Jeff, mostly lots of little files, like there will be 4-5 that are
> 1-1.5gb or so and then many at 5-50MB and many at 40-50MB each.
>
> Re incremental repair; Yes one of my engineers started an incremental
> repair on this column family that we had to abort.  In fact, the node that
> the repair was initiated on ran out of disk space and we ended replacing
> that node like a dead node.
>
> Oddly the new node is experiencing this issue as well.
>
> -B
>
>
> On Tue, Aug 7, 2018 at 8:04 PM Jeff Jirsa <jjirsa@gmail.com> wrote:
>
>> You could toggle off the tombstone compaction to see if that helps, but
>> that should be lower priority than normal compactions
>>
>> Are the lots-of-little-files from memtable flushes or
>> repair/anticompaction?
>>
>> Do you do normal deletes? Did you try to run Incremental repair?
>>
>> --
>> Jeff Jirsa
>>
>>
>> On Aug 7, 2018, at 5:00 PM, Brian Spindler <brian.spindler@gmail.com>
>> wrote:
>>
>> Hi Jonathan, both I believe.
>>
>> The window size is 1 day, full settings:
>>     AND compaction = {'timestamp_resolution': 'MILLISECONDS',
>> 'unchecked_tombstone_compaction': 'true', 'compaction_window_size': '1',
>> 'compaction_window_unit': 'DAYS', 'tombstone_compaction_interval': '86400',
>> 'tombstone_threshold': '0.2', 'class':
>> 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'}
>>
>>
>> nodetool tpstats
>>
>> Pool Name                    Active   Pending      Completed   Blocked
>> All time blocked
>> MutationStage                     0         0    68582241832         0
>>              0
>> ReadStage                         0         0      209566303         0
>>              0
>> RequestResponseStage              0         0    44680860850         0
>>              0
>> ReadRepairStage                   0         0       24562722         0
>>              0
>> CounterMutationStage              0         0              0         0
>>              0
>> MiscStage                         0         0              0         0
>>              0
>> HintedHandoff                     1         1            203         0
>>              0
>> GossipStage                       0         0        8471784         0
>>              0
>> CacheCleanupExecutor              0         0            122         0
>>              0
>> InternalResponseStage             0         0         552125         0
>>              0
>> CommitLogArchiver                 0         0              0         0
>>              0
>> CompactionExecutor                8        42        1433715         0
>>              0
>> ValidationExecutor                0         0           2521         0
>>              0
>> MigrationStage                    0         0         527549         0
>>              0
>> AntiEntropyStage                  0         0           7697         0
>>              0
>> PendingRangeCalculator            0         0             17         0
>>              0
>> Sampler                           0         0              0         0
>>              0
>> MemtableFlushWriter               0         0         116966         0
>>              0
>> MemtablePostFlush                 0         0         209103         0
>>              0
>> MemtableReclaimMemory             0         0         116966         0
>>              0
>> Native-Transport-Requests         1         0     1715937778         0
>>         176262
>>
>> Message type           Dropped
>> READ                         2
>> RANGE_SLICE                  0
>> _TRACE                       0
>> MUTATION                  4390
>> COUNTER_MUTATION             0
>> BINARY                       0
>> REQUEST_RESPONSE          1882
>> PAGED_RANGE                  0
>> READ_REPAIR                  0
>>
>>
>> On Tue, Aug 7, 2018 at 7:57 PM Jonathan Haddad <jon@jonhaddad.com> wrote:
>>
>>> What's your window size?
>>>
>>> When you say backed up, how are you measuring that?  Are there pending
>>> tasks or do you just see more files than you expect?
>>>
>>> On Tue, Aug 7, 2018 at 4:38 PM Brian Spindler <brian.spindler@gmail.com>
>>> wrote:
>>>
>>>> Hey guys, quick question:
>>>>
>>>> I've got a v2.1 cassandra cluster, 12 nodes on aws i3.2xl, commit log
>>>> on one drive, data on nvme.  That was working very well, it's a ts db and
>>>> has been accumulating data for about 4weeks.
>>>>
>>>> The nodes have increased in load and compaction seems to be falling
>>>> behind.  I used to get about 1 file per day for this column family, about
>>>> ~30GB Data.db file per day.  I am now getting hundreds per day at  1mb -
>>>> 50mb.
>>>>
>>>> How to recover from this?
>>>>
>>>> I can scale out to give some breathing room but will it go back and
>>>> compact the old days into nicely packed files for the day?
>>>>
>>>> I tried setting compaction throughput to 1000 from 256 and it seemed to
>>>> make things worse for the CPU, it's configured on i3.2xl with 8 compaction
>>>> threads.
>>>>
>>>> -B
>>>>
>>>> Lastly, I have mixed TTLs in this CF and need to run a repair (I think)
>>>> to get rid of old tombstones, however running repairs in 2.1 on TWCS column
>>>> families causes a very large spike in sstable counts due to anti-compaction
>>>> which causes a lot of disruption, is there any other way?
>>>>
>>>>
>>>>
>>>
>>> --
>>> Jon Haddad
>>> http://www.rustyrazorblade.com
>>> twitter: rustyrazorblade
>>>
>>

Mime
View raw message