cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Spindler <brian.spind...@gmail.com>
Subject Re: TWCS Compaction backed up
Date Wed, 08 Aug 2018 01:18:53 GMT
In fact all of them say Repaired at: 0.

On Tue, Aug 7, 2018 at 9:13 PM Brian Spindler <brian.spindler@gmail.com>
wrote:

> Hi, I spot checked a couple of the files that were ~200MB and the mostly
> had "Repaired at: 0" so maybe that's not it?
>
> -B
>
>
> On Tue, Aug 7, 2018 at 8:16 PM <brian.spindler@gmail.com> wrote:
>
>> Everything is ttl’d
>>
>> I suppose I could use sstablemeta to see the repaired bit, could I just
>> set that to unrepaired somehow and that would fix?
>>
>> Thanks!
>>
>> On Aug 7, 2018, at 8:12 PM, Jeff Jirsa <jjirsa@gmail.com> wrote:
>>
>> May be worth seeing if any of the sstables got promoted to repaired - if
>> so they’re not eligible for compaction with unrepaired sstables and that
>> could explain some higher counts
>>
>> Do you actually do deletes or is everything ttl’d?
>>
>>
>> --
>> Jeff Jirsa
>>
>>
>> On Aug 7, 2018, at 5:09 PM, Brian Spindler <brian.spindler@gmail.com>
>> wrote:
>>
>> Hi Jeff, mostly lots of little files, like there will be 4-5 that are
>> 1-1.5gb or so and then many at 5-50MB and many at 40-50MB each.
>>
>> Re incremental repair; Yes one of my engineers started an incremental
>> repair on this column family that we had to abort.  In fact, the node that
>> the repair was initiated on ran out of disk space and we ended replacing
>> that node like a dead node.
>>
>> Oddly the new node is experiencing this issue as well.
>>
>> -B
>>
>>
>> On Tue, Aug 7, 2018 at 8:04 PM Jeff Jirsa <jjirsa@gmail.com> wrote:
>>
>>> You could toggle off the tombstone compaction to see if that helps, but
>>> that should be lower priority than normal compactions
>>>
>>> Are the lots-of-little-files from memtable flushes or
>>> repair/anticompaction?
>>>
>>> Do you do normal deletes? Did you try to run Incremental repair?
>>>
>>> --
>>> Jeff Jirsa
>>>
>>>
>>> On Aug 7, 2018, at 5:00 PM, Brian Spindler <brian.spindler@gmail.com>
>>> wrote:
>>>
>>> Hi Jonathan, both I believe.
>>>
>>> The window size is 1 day, full settings:
>>>     AND compaction = {'timestamp_resolution': 'MILLISECONDS',
>>> 'unchecked_tombstone_compaction': 'true', 'compaction_window_size': '1',
>>> 'compaction_window_unit': 'DAYS', 'tombstone_compaction_interval': '86400',
>>> 'tombstone_threshold': '0.2', 'class':
>>> 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'}
>>>
>>>
>>> nodetool tpstats
>>>
>>> Pool Name                    Active   Pending      Completed   Blocked
>>> All time blocked
>>> MutationStage                     0         0    68582241832         0
>>>                0
>>> ReadStage                         0         0      209566303         0
>>>                0
>>> RequestResponseStage              0         0    44680860850         0
>>>                0
>>> ReadRepairStage                   0         0       24562722         0
>>>                0
>>> CounterMutationStage              0         0              0         0
>>>                0
>>> MiscStage                         0         0              0         0
>>>                0
>>> HintedHandoff                     1         1            203         0
>>>                0
>>> GossipStage                       0         0        8471784         0
>>>                0
>>> CacheCleanupExecutor              0         0            122         0
>>>                0
>>> InternalResponseStage             0         0         552125         0
>>>                0
>>> CommitLogArchiver                 0         0              0         0
>>>                0
>>> CompactionExecutor                8        42        1433715         0
>>>                0
>>> ValidationExecutor                0         0           2521         0
>>>                0
>>> MigrationStage                    0         0         527549         0
>>>                0
>>> AntiEntropyStage                  0         0           7697         0
>>>                0
>>> PendingRangeCalculator            0         0             17         0
>>>                0
>>> Sampler                           0         0              0         0
>>>                0
>>> MemtableFlushWriter               0         0         116966         0
>>>                0
>>> MemtablePostFlush                 0         0         209103         0
>>>                0
>>> MemtableReclaimMemory             0         0         116966         0
>>>                0
>>> Native-Transport-Requests         1         0     1715937778         0
>>>           176262
>>>
>>> Message type           Dropped
>>> READ                         2
>>> RANGE_SLICE                  0
>>> _TRACE                       0
>>> MUTATION                  4390
>>> COUNTER_MUTATION             0
>>> BINARY                       0
>>> REQUEST_RESPONSE          1882
>>> PAGED_RANGE                  0
>>> READ_REPAIR                  0
>>>
>>>
>>> On Tue, Aug 7, 2018 at 7:57 PM Jonathan Haddad <jon@jonhaddad.com>
>>> wrote:
>>>
>>>> What's your window size?
>>>>
>>>> When you say backed up, how are you measuring that?  Are there pending
>>>> tasks or do you just see more files than you expect?
>>>>
>>>> On Tue, Aug 7, 2018 at 4:38 PM Brian Spindler <brian.spindler@gmail.com>
>>>> wrote:
>>>>
>>>>> Hey guys, quick question:
>>>>>
>>>>> I've got a v2.1 cassandra cluster, 12 nodes on aws i3.2xl, commit log
>>>>> on one drive, data on nvme.  That was working very well, it's a ts db
and
>>>>> has been accumulating data for about 4weeks.
>>>>>
>>>>> The nodes have increased in load and compaction seems to be falling
>>>>> behind.  I used to get about 1 file per day for this column family, about
>>>>> ~30GB Data.db file per day.  I am now getting hundreds per day at  1mb
-
>>>>> 50mb.
>>>>>
>>>>> How to recover from this?
>>>>>
>>>>> I can scale out to give some breathing room but will it go back and
>>>>> compact the old days into nicely packed files for the day?
>>>>>
>>>>> I tried setting compaction throughput to 1000 from 256 and it seemed
>>>>> to make things worse for the CPU, it's configured on i3.2xl with 8
>>>>> compaction threads.
>>>>>
>>>>> -B
>>>>>
>>>>> Lastly, I have mixed TTLs in this CF and need to run a repair (I
>>>>> think) to get rid of old tombstones, however running repairs in 2.1 on
TWCS
>>>>> column families causes a very large spike in sstable counts due to
>>>>> anti-compaction which causes a lot of disruption, is there any other
way?
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Jon Haddad
>>>> http://www.rustyrazorblade.com
>>>> twitter: rustyrazorblade
>>>>
>>>

Mime
View raw message