cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carl Mueller <carl.muel...@smartthings.com.INVALID>
Subject Re: 2.1.9 --> 2.2.13 upgrade node startup after upgrade very slow
Date Wed, 17 Apr 2019 18:31:10 GMT
No, we just did the package upgrade 2.1.9 --> 2.2.13

It definitely feels like some indexes are being recalculated or the entire
sstables are being scanned due to suspected corruption.


On Wed, Apr 17, 2019 at 12:32 PM Jeff Jirsa <jjirsa@gmail.com> wrote:

> There was a time when changing some of the parameters (especially bloom
> filter FP ratio) would cause the bloom filters to be rebuilt on startup if
> the sstables didnt match what was in the schema, leading to a delay like
> that and similar logs. Any chance you changed the schema on that table
> since the last time you restarted it?
>
>
>
> On Wed, Apr 17, 2019 at 10:30 AM Carl Mueller
> <carl.mueller@smartthings.com.invalid> wrote:
>
>> Oh, the table in question is SizeTiered, had about 10 sstables total, it
>> was JBOD across two data directories.
>>
>> On Wed, Apr 17, 2019 at 12:26 PM Carl Mueller <
>> carl.mueller@smartthings.com> wrote:
>>
>>> We are doing a ton of upgrades to get out of 2.1.x. We've done probably
>>> 20-30 clusters so far and have not encountered anything like this yet.
>>>
>>> After upgrade of a node, the restart takes a long time. like 10 minutes
>>> long. ALmost all of our other nodes took less than 2 minutes to upgrade
>>> (aside from sstableupgrades).
>>>
>>> The startup stalls on a particular table, it is the largest table at
>>> about 300GB, but we have upgraded other clusters with about that much data
>>> without this 8-10 minute delay. We have the ability to roll back the node,
>>> and the restart as a 2.1.x node is normal with no delays.
>>>
>>> Alas this is a prod cluster so we are going to try to sstable load the
>>> data on a lower environment and try to replicate the delay. If we can, we
>>> will turn on debug logging.
>>>
>>> This occurred on the first node we tried to upgrade. It is possible it
>>> is limited to only this node, but we are gunshy to play around with
>>> upgrades in prod.
>>>
>>> We have an automated upgrading program that flushes, snapshots, shuts
>>> down gossip, drains before upgrade, suppressed autostart on upgrade, and
>>> has worked about as flawlessly as one could hope for so far for 2.1->2.2
>>> and 2.2-> 3.11 upgrades.
>>>
>>> INFO  [main] 2019-04-16 17:22:17,004 ColumnFamilyStore.java:389 -
>>> Initializing zzzz.access_token
>>> INFO  [main] 2019-04-16 17:22:17,096 ColumnFamilyStore.java:389 -
>>> Initializing zzzz.refresh_token
>>> INFO  [main] 2019-04-16 17:28:52,929 ColumnFamilyStore.java:389 -
>>> Initializing zzzz.userid
>>> INFO  [main] 2019-04-16 17:28:52,930 ColumnFamilyStore.java:389 -
>>> Initializing zzzz.access_token_by_auth
>>>
>>> You can see the 6:30 delay in the startup log above. All the other
>>> keyspace/tables initialize in under a second.
>>>
>>>
>>>

Mime
View raw message