cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Podkowinski (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-11748) Schema version mismatch may leads to Casandra OOM at bootstrap during a rolling upgrade process
Date Thu, 10 Aug 2017 11:40:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-11748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121506#comment-16121506
] 

Stefan Podkowinski commented on CASSANDRA-11748:
------------------------------------------------


I'm not sure introducing a hard cap on pending outgoing pull requests and simply dopping anything
from there is the way to go here. The good thing about the approach is that it's pretty much
stateless, except from the atomic counter. But we should at least take the schema Ids and/or
endpoints into account as well. It just doesn't make sense to queue 50 requests for the same
schema Id and potentially drop requests for a different schema afterwards. Also as already
noted, issuing pulls in parallel is probably not what we want, as this could lead to the described
OOM issue, when too many responses get queued and applied at the same time. So I think we
don't get around managing some more state, such as schema Ids, endpoints, last request time,
delay, .., that we can use to schedule pulls in a more efficient way, by doing one request
after another. 

But we should also not forget to look at the receiver side for incoming pull requests. Joining
the cluster with a schema mismatch should not cause a node to answer each of those in parallel.
If we keep track of pending incoming schema requests, we could introduce a delay before responding
and create the schema mutations just once as payload to be used for all of them. We might
have to bump up the MIGRATION_REQUEST timeout a in that case, but otherwise just delaying
a few seconds should make a notable difference for nodes joining the cluster and having to
answer to many migration requests in a short time frame.

> Schema version mismatch may leads to Casandra OOM at bootstrap during a rolling upgrade
process
> -----------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-11748
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11748
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: Rolling upgrade process from 1.2.19 to 2.0.17. 
> CentOS 6.6
> Occurred in different C* node of different scale of deployment (2G ~ 5G)
>            Reporter: Michael Fong
>            Assignee: Matt Byrd
>            Priority: Critical
>             Fix For: 3.0.x, 3.11.x, 4.x
>
>
> We have observed multiple times when a multi-node C* (v2.0.17) cluster ran into OOM in
bootstrap during a rolling upgrade process from 1.2.19 to 2.0.17. 
> Here is the simple guideline of our rolling upgrade process
> 1. Update schema on a node, and wait until all nodes to be in schema version agreemnt
- via nodetool describeclulster
> 2. Restart a Cassandra node
> 3. After restart, there is a chance that the the restarted node has different schema
version.
> 4. All nodes in cluster start to rapidly exchange schema information, and any of node
could run into OOM. 
> The following is the system.log that occur in one of our 2-node cluster test bed
> ----------------------------------
> Before rebooting node 2:
> Node 1: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,326 MigrationManager.java (line
328) Gossiping my schema version 4cb463f8-5376-3baf-8e88-a5cc6a94f58f
> Node 2: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,122 MigrationManager.java (line
328) Gossiping my schema version 4cb463f8-5376-3baf-8e88-a5cc6a94f58f
> After rebooting node 2, 
> Node 2: DEBUG [main] 2016-04-19 11:18:18,016 MigrationManager.java (line 328) Gossiping
my schema version f5270873-ba1f-39c7-ab2e-a86db868b09b
> The node2  keeps submitting the migration task over 100+ times to the other node.
> INFO [GossipStage:1] 2016-04-19 11:18:18,261 Gossiper.java (line 1011) Node /192.168.88.33
has restarted, now UP
> INFO [GossipStage:1] 2016-04-19 11:18:18,262 TokenMetadata.java (line 414) Updating topology
for /192.168.88.33
> ...
> DEBUG [GossipStage:1] 2016-04-19 11:18:18,265 MigrationManager.java (line 102) Submitting
migration task for /192.168.88.33
> ... ( over 100+ times)
> ----------------------------------
> On the otherhand, Node 1 keeps updating its gossip information, followed by receiving
and submitting migrationTask afterwards: 
> INFO [RequestResponseStage:3] 2016-04-19 11:18:18,333 Gossiper.java (line 978) InetAddress
/192.168.88.34 is now UP
> ...
> DEBUG [MigrationStage:1] 2016-04-19 11:18:18,496 MigrationRequestVerbHandler.java (line
41) Received migration request from /192.168.88.34.
> …… ( over 100+ times)
> DEBUG [OptionalTasks:1] 2016-04-19 11:19:18,337 MigrationManager.java (line 127) submitting
migration task for /192.168.88.34
> .....  (over 50+ times)
> On the side note, we have over 200+ column families defined in Cassandra database, which
may related to this amount of rpc traffic.
> P.S.2 The over requested schema migration task will eventually have InternalResponseStage
performing schema merge operation. Since this operation requires a compaction for each merge
and is much slower to consume. Thus, the back-pressure of incoming schema migration content
objects consumes all of the heap space and ultimately ends up OOM!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message