Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Date: Thu, 10 Aug 2017 11:40:00 +0000 (UTC)
From: "Stefan Podkowinski (JIRA)" <jira@apache.org>
To: commits@cassandra.apache.org
Message-ID: <JIRA.12967282.1462937259000.137489.1502365200487@Atlassian.JIRA>
In-Reply-To: <JIRA.12967282.1462937259000@Atlassian.JIRA>
References: <JIRA.12967282.1462937259000@Atlassian.JIRA> <JIRA.12967282.1462937259937@jira-lw-us.apache.org>
Subject: [jira] [Commented] (CASSANDRA-11748) Schema version mismatch may
 leads to Casandra OOM at bootstrap during a rolling upgrade process
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
archived-at: Thu, 10 Aug 2017 11:40:07 -0000


    [ https://issues.apache.org/jira/browse/CASSANDRA-11748?page=3Dcom.atla=
ssian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=
=3D16121506#comment-16121506 ]=20

Stefan Podkowinski commented on CASSANDRA-11748:
------------------------------------------------


I'm not sure introducing a hard cap on pending outgoing pull requests and s=
imply dopping anything from there is the way to go here. The good thing abo=
ut the approach is that it's pretty much stateless, except from the atomic =
counter. But we should at least take the schema Ids and/or endpoints into a=
ccount as well. It just doesn't make sense to queue 50 requests for the sam=
e schema Id and potentially drop requests for a different schema afterwards=
. Also as already noted, issuing pulls in parallel is probably not what we =
want, as this could lead to the described OOM issue, when too many response=
s get queued and applied at the same time. So I think we don't get around m=
anaging some more state, such as schema Ids, endpoints, last request time, =
delay, .., that we can use to schedule pulls in a more efficient way, by do=
ing one request after another.=20

But we should also not forget to look at the receiver side for incoming pul=
l requests. Joining the cluster with a schema mismatch should not cause a n=
ode to answer each of those in parallel. If we keep track of pending incomi=
ng schema requests, we could introduce a delay before responding and create=
 the schema mutations just once as payload to be used for all of them. We m=
ight have to bump up the MIGRATION_REQUEST timeout a in that case, but othe=
rwise just delaying a few seconds should make a notable difference for node=
s joining the cluster and having to answer to many migration requests in a =
short time frame.

> Schema version mismatch may leads to Casandra OOM at bootstrap during a r=
olling upgrade process
> -------------------------------------------------------------------------=
----------------------
>
>                 Key: CASSANDRA-11748
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1174=
8
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: Rolling upgrade process from 1.2.19 to 2.0.17.=20
> CentOS 6.6
> Occurred in different C* node of different scale of deployment (2G ~ 5G)
>            Reporter: Michael Fong
>            Assignee: Matt Byrd
>            Priority: Critical
>             Fix For: 3.0.x, 3.11.x, 4.x
>
>
> We have observed multiple times when a multi-node C* (v2.0.17) cluster ra=
n into OOM in bootstrap during a rolling upgrade process from 1.2.19 to 2.0=
.17.=20
> Here is the simple guideline of our rolling upgrade process
> 1. Update schema on a node, and wait until all nodes to be in schema vers=
ion agreemnt - via nodetool describeclulster
> 2. Restart a Cassandra node
> 3. After restart, there is a chance that the the restarted node has diffe=
rent schema version.
> 4. All nodes in cluster start to rapidly exchange schema information, and=
 any of node could run into OOM.=20
> The following is the system.log that occur in one of our 2-node cluster t=
est bed
> ----------------------------------
> Before rebooting node 2:
> Node 1: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,326 MigrationManager=
.java (line 328) Gossiping my schema version 4cb463f8-5376-3baf-8e88-a5cc6a=
94f58f
> Node 2: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,122 MigrationManager=
.java (line 328) Gossiping my schema version 4cb463f8-5376-3baf-8e88-a5cc6a=
94f58f
> After rebooting node 2,=20
> Node 2: DEBUG [main] 2016-04-19 11:18:18,016 MigrationManager.java (line =
328) Gossiping my schema version f5270873-ba1f-39c7-ab2e-a86db868b09b
> The node2  keeps submitting the migration task over 100+ times to the oth=
er node.
> INFO [GossipStage:1] 2016-04-19 11:18:18,261 Gossiper.java (line 1011) No=
de /192.168.88.33 has restarted, now UP
> INFO [GossipStage:1] 2016-04-19 11:18:18,262 TokenMetadata.java (line 414=
) Updating topology for /192.168.88.33
> ...
> DEBUG [GossipStage:1] 2016-04-19 11:18:18,265 MigrationManager.java (line=
 102) Submitting migration task for /192.168.88.33
> ... ( over 100+ times)
> ----------------------------------
> On the otherhand, Node 1 keeps updating its gossip information, followed =
by receiving and submitting migrationTask afterwards:=20
> INFO [RequestResponseStage:3] 2016-04-19 11:18:18,333 Gossiper.java (line=
 978) InetAddress /192.168.88.34 is now UP
> ...
> DEBUG [MigrationStage:1] 2016-04-19 11:18:18,496 MigrationRequestVerbHand=
ler.java (line 41) Received migration request from /192.168.88.34.
> =E2=80=A6=E2=80=A6 ( over 100+ times)
> DEBUG [OptionalTasks:1] 2016-04-19 11:19:18,337 MigrationManager.java (li=
ne 127) submitting migration task for /192.168.88.34
> .....  (over 50+ times)
> On the side note, we have over 200+ column families defined in Cassandra =
database, which may related to this amount of rpc traffic.
> P.S.2 The over requested schema migration task will eventually have Inter=
nalResponseStage performing schema merge operation. Since this operation re=
quires a compaction for each merge and is much slower to consume. Thus, the=
 back-pressure of incoming schema migration content objects consumes all of=
 the heap space and ultimately ends up OOM!


--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org