cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Byrd (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-11748) Schema version mismatch may leads to Casandra OOM at bootstrap during a rolling upgrade process
Date Thu, 10 Aug 2017 19:47:00 GMT


Matt Byrd commented on CASSANDRA-11748:

But we should at least take the schema Ids and/or endpoints into account as well. It just
doesn't make sense to queue 50 requests for the same schema Id and potentially drop requests
for a different schema afterwards.
Yes, I did also have a patch with an expiring map of schema-version to counter and was limiting
it per schema version, but decided to keep it simple, since the single limit sufficed for
a particular scenario. Less relevant, but it also provides some protection in the rather strange
case that there are actually lots of different schema versions in the cluster. I could resurrect
the schema version patch, but it sounds like we're considering a slightly different approach.

Schedule that pull with a delay instead, give the new node a chance to pull the new schema
from one of the nodes in the cluster. It'll most likely converge by the time the delay has
passed, so we'd just abort the request if schema versions now match.
Once a node has been up for MIGRATION_DELAY_IN_MS and doesn't have an empty schema, it will
always schedule the task to pull schema with a delay of MIGRATION_DELAY_IN_MS and then do
a further check within the task itself to see if the schema versions still differ before asking
for schema.

Though admittedly this problem does still exist if two nodes start up at the same time, they
may pull from each other.
I suppose we're going to schedule a pull from a newer node too, then assuming we successively
merge the schema together we end up hopefully at the final desired state? Although in the
interim I suppose it's possible a node might come into play with a slightly older schema,
but I suppose that can just happen whenever a DOWN node comes up with out of date schema.

It's also possible that if the node is so overwhelmed by the reverse problem, it won't have
made it to the correct schema version in MIGRATION_DELAY_IN_MS and hence will start sending
it's old schema back at all the other nodes in the cluster, fortunately the sending happens
on the migration stage so is single threaded and less likely to cause OOMS. 

> Schema version mismatch may leads to Casandra OOM at bootstrap during a rolling upgrade
> -----------------------------------------------------------------------------------------------
>                 Key: CASSANDRA-11748
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: Rolling upgrade process from 1.2.19 to 2.0.17. 
> CentOS 6.6
> Occurred in different C* node of different scale of deployment (2G ~ 5G)
>            Reporter: Michael Fong
>            Assignee: Matt Byrd
>            Priority: Critical
>             Fix For: 3.0.x, 3.11.x, 4.x
> We have observed multiple times when a multi-node C* (v2.0.17) cluster ran into OOM in
bootstrap during a rolling upgrade process from 1.2.19 to 2.0.17. 
> Here is the simple guideline of our rolling upgrade process
> 1. Update schema on a node, and wait until all nodes to be in schema version agreemnt
- via nodetool describeclulster
> 2. Restart a Cassandra node
> 3. After restart, there is a chance that the the restarted node has different schema
> 4. All nodes in cluster start to rapidly exchange schema information, and any of node
could run into OOM. 
> The following is the system.log that occur in one of our 2-node cluster test bed
> ----------------------------------
> Before rebooting node 2:
> Node 1: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,326 (line
328) Gossiping my schema version 4cb463f8-5376-3baf-8e88-a5cc6a94f58f
> Node 2: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,122 (line
328) Gossiping my schema version 4cb463f8-5376-3baf-8e88-a5cc6a94f58f
> After rebooting node 2, 
> Node 2: DEBUG [main] 2016-04-19 11:18:18,016 (line 328) Gossiping
my schema version f5270873-ba1f-39c7-ab2e-a86db868b09b
> The node2  keeps submitting the migration task over 100+ times to the other node.
> INFO [GossipStage:1] 2016-04-19 11:18:18,261 (line 1011) Node /
has restarted, now UP
> INFO [GossipStage:1] 2016-04-19 11:18:18,262 (line 414) Updating topology
for /
> ...
> DEBUG [GossipStage:1] 2016-04-19 11:18:18,265 (line 102) Submitting
migration task for /
> ... ( over 100+ times)
> ----------------------------------
> On the otherhand, Node 1 keeps updating its gossip information, followed by receiving
and submitting migrationTask afterwards: 
> INFO [RequestResponseStage:3] 2016-04-19 11:18:18,333 (line 978) InetAddress
/ is now UP
> ...
> DEBUG [MigrationStage:1] 2016-04-19 11:18:18,496 (line
41) Received migration request from /
> …… ( over 100+ times)
> DEBUG [OptionalTasks:1] 2016-04-19 11:19:18,337 (line 127) submitting
migration task for /
> .....  (over 50+ times)
> On the side note, we have over 200+ column families defined in Cassandra database, which
may related to this amount of rpc traffic.
> P.S.2 The over requested schema migration task will eventually have InternalResponseStage
performing schema merge operation. Since this operation requires a compaction for each merge
and is much slower to consume. Thus, the back-pressure of incoming schema migration content
objects consumes all of the heap space and ultimately ends up OOM!

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message