Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Date: Wed, 2 Sep 2015 04:31:45 +0000 (UTC)
From: "Andrew Hust (JIRA)" <jira@apache.org>
To: commits@cassandra.apache.org
Message-ID: <JIRA.12861214.1441167607000.231203.1441168305681@Atlassian.JIRA>
In-Reply-To: <JIRA.12861214.1441167607000@Atlassian.JIRA>
References: <JIRA.12861214.1441167607000@Atlassian.JIRA>
 <JIRA.12861214.1441167607760@arcas>
Subject: [jira] [Updated] (CASSANDRA-10250) Executing lots of schema alters
 concurrently can lead to dropped alters
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


     [ https://issues.apache.org/jira/browse/CASSANDRA-10250?page=3Dcom.atl=
assian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Hust updated CASSANDRA-10250:
------------------------------------
    Description:=20
A recently added [dtest|http://cassci.datastax.com/view/cassandra-3.0/job/c=
assandra-3.0_dtest/132/testReport/junit/concurrent_schema_changes_test/Test=
ConcurrentSchemaChanges/create_lots_of_schema_churn_test/] has been flappin=
g on cassci and has exposed an issue with running lots of schema alteration=
s concurrently.  The failures occur on healthy clusters but seem to occur a=
t higher rates when 1 node is down during the alters.

The test executes the following =E2=80=93 440 total commands:
-=09Create 20 new tables
-=09Drop 7 columns one at time across 20 tables
-=09Add 7 columns one at time across 20 tables
-=09Add one column index on each of the 7 columns on 20 tables

Outcome is random. Majority of the failures are dropped columns still being=
 present, but new columns and indexes have been observed to be incorrect.  =
The logs are don=E2=80=99t have exceptions and the columns/indexes that are=
 incorrect don=E2=80=99t seem to follow a pattern.  Running a {{nodetool de=
scribecluster}} on each node shows the same schema id on all nodes.

Attached is a python script extracted from the dtest.  Running against a lo=
cal 3 node cluster will reproduce the issue (with enough runs =E2=80=93 fai=
ls ~20% on my machine).

Also attached is the node logs from a run with when a dropped column (alter=
_me_7 table, column s1) is still present.  Checking the system_schema table=
s for this case shows the s1 column in both the columns and drop_columns ta=
bles.

This has been flapping on cassci on versions 2+ and doesn=E2=80=99t seem to=
 be related to changes in 3.0.  More testing needs to be done though.

//cc [~enigmacurry]

  was:
A recently added [dtest|http://cassci.datastax.com/view/cassandra-3.0/job/c=
assandra-3.0_dtest/132/testReport/junit/concurrent_schema_changes_test/Test=
ConcurrentSchemaChanges/create_lots_of_schema_churn_test/] has been flappin=
g on cassci and has exposed an issue with running lots of schema alteration=
s concurrently.  The failures occur on healthy clusters but seem to occur a=
t higher rates when 1 node is down during the alters.

The test executes the following =E2=80=93 440 total commands:
-=09Create 20 new tables
-=09Drop 7 columns one at time across 20 tables
-=09Add 7 columns on at time across 20 tables
-=09Add one column index on each of the 7 columns on 20 tables

Outcome is random. Majority of the failures are dropped columns still being=
 present, but new columns and indexes have been observed to be incorrect.  =
The logs are don=E2=80=99t have exceptions and the columns/indexes that are=
 incorrect don=E2=80=99t seem to follow a pattern.  Running a {{nodetool de=
scribecluster}} on each node shows the same schema id on all nodes.

Attached is a python script extracted from the dtest.  Running against a lo=
cal 3 node cluster will reproduce the issue (with enough runs =E2=80=93 fai=
ls ~20% on my machine).

Also attached is the node logs from a run with when a dropped column (alter=
_me_7 table, column s1) is still present.  Checking the system_schema table=
s for this case shows the s1 column in both the columns and drop_columns ta=
bles.

This has been flapping on cassci on versions 2+ and doesn=E2=80=99t seem to=
 be related to changes in 3.0.  More testing needs to be done though.

//cc [~enigmacurry]


> Executing lots of schema alters concurrently can lead to dropped alters
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-10250
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1025=
0
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Andrew Hust
>         Attachments: concurrent_schema_changes.py, node1.log, node2.log, =
node3.log
>
>
> A recently added [dtest|http://cassci.datastax.com/view/cassandra-3.0/job=
/cassandra-3.0_dtest/132/testReport/junit/concurrent_schema_changes_test/Te=
stConcurrentSchemaChanges/create_lots_of_schema_churn_test/] has been flapp=
ing on cassci and has exposed an issue with running lots of schema alterati=
ons concurrently.  The failures occur on healthy clusters but seem to occur=
 at higher rates when 1 node is down during the alters.
> The test executes the following =E2=80=93 440 total commands:
> -=09Create 20 new tables
> -=09Drop 7 columns one at time across 20 tables
> -=09Add 7 columns one at time across 20 tables
> -=09Add one column index on each of the 7 columns on 20 tables
> Outcome is random. Majority of the failures are dropped columns still bei=
ng present, but new columns and indexes have been observed to be incorrect.=
  The logs are don=E2=80=99t have exceptions and the columns/indexes that a=
re incorrect don=E2=80=99t seem to follow a pattern.  Running a {{nodetool =
describecluster}} on each node shows the same schema id on all nodes.
> Attached is a python script extracted from the dtest.  Running against a =
local 3 node cluster will reproduce the issue (with enough runs =E2=80=93 f=
ails ~20% on my machine).
> Also attached is the node logs from a run with when a dropped column (alt=
er_me_7 table, column s1) is still present.  Checking the system_schema tab=
les for this case shows the s1 column in both the columns and drop_columns =
tables.
> This has been flapping on cassci on versions 2+ and doesn=E2=80=99t seem =
to be related to changes in 3.0.  More testing needs to be done though.
> //cc [~enigmacurry]


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)