Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 14A6217ADD for ; Wed, 2 Sep 2015 04:31:46 +0000 (UTC) Received: (qmail 20004 invoked by uid 500); 2 Sep 2015 04:31:45 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 19965 invoked by uid 500); 2 Sep 2015 04:31:45 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 19950 invoked by uid 99); 2 Sep 2015 04:31:45 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Sep 2015 04:31:45 +0000 Date: Wed, 2 Sep 2015 04:31:45 +0000 (UTC) From: "Andrew Hust (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CASSANDRA-10250) Executing lots of schema alters concurrently can lead to dropped alters MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-10250?page=3Dcom.atl= assian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Hust updated CASSANDRA-10250: ------------------------------------ Description:=20 A recently added [dtest|http://cassci.datastax.com/view/cassandra-3.0/job/c= assandra-3.0_dtest/132/testReport/junit/concurrent_schema_changes_test/Test= ConcurrentSchemaChanges/create_lots_of_schema_churn_test/] has been flappin= g on cassci and has exposed an issue with running lots of schema alteration= s concurrently. The failures occur on healthy clusters but seem to occur a= t higher rates when 1 node is down during the alters. The test executes the following =E2=80=93 440 total commands: -=09Create 20 new tables -=09Drop 7 columns one at time across 20 tables -=09Add 7 columns one at time across 20 tables -=09Add one column index on each of the 7 columns on 20 tables Outcome is random. Majority of the failures are dropped columns still being= present, but new columns and indexes have been observed to be incorrect. = The logs are don=E2=80=99t have exceptions and the columns/indexes that are= incorrect don=E2=80=99t seem to follow a pattern. Running a {{nodetool de= scribecluster}} on each node shows the same schema id on all nodes. Attached is a python script extracted from the dtest. Running against a lo= cal 3 node cluster will reproduce the issue (with enough runs =E2=80=93 fai= ls ~20% on my machine). Also attached is the node logs from a run with when a dropped column (alter= _me_7 table, column s1) is still present. Checking the system_schema table= s for this case shows the s1 column in both the columns and drop_columns ta= bles. This has been flapping on cassci on versions 2+ and doesn=E2=80=99t seem to= be related to changes in 3.0. More testing needs to be done though. //cc [~enigmacurry] was: A recently added [dtest|http://cassci.datastax.com/view/cassandra-3.0/job/c= assandra-3.0_dtest/132/testReport/junit/concurrent_schema_changes_test/Test= ConcurrentSchemaChanges/create_lots_of_schema_churn_test/] has been flappin= g on cassci and has exposed an issue with running lots of schema alteration= s concurrently. The failures occur on healthy clusters but seem to occur a= t higher rates when 1 node is down during the alters. The test executes the following =E2=80=93 440 total commands: -=09Create 20 new tables -=09Drop 7 columns one at time across 20 tables -=09Add 7 columns on at time across 20 tables -=09Add one column index on each of the 7 columns on 20 tables Outcome is random. Majority of the failures are dropped columns still being= present, but new columns and indexes have been observed to be incorrect. = The logs are don=E2=80=99t have exceptions and the columns/indexes that are= incorrect don=E2=80=99t seem to follow a pattern. Running a {{nodetool de= scribecluster}} on each node shows the same schema id on all nodes. Attached is a python script extracted from the dtest. Running against a lo= cal 3 node cluster will reproduce the issue (with enough runs =E2=80=93 fai= ls ~20% on my machine). Also attached is the node logs from a run with when a dropped column (alter= _me_7 table, column s1) is still present. Checking the system_schema table= s for this case shows the s1 column in both the columns and drop_columns ta= bles. This has been flapping on cassci on versions 2+ and doesn=E2=80=99t seem to= be related to changes in 3.0. More testing needs to be done though. //cc [~enigmacurry] > Executing lots of schema alters concurrently can lead to dropped alters > ----------------------------------------------------------------------- > > Key: CASSANDRA-10250 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1025= 0 > Project: Cassandra > Issue Type: Bug > Reporter: Andrew Hust > Attachments: concurrent_schema_changes.py, node1.log, node2.log, = node3.log > > > A recently added [dtest|http://cassci.datastax.com/view/cassandra-3.0/job= /cassandra-3.0_dtest/132/testReport/junit/concurrent_schema_changes_test/Te= stConcurrentSchemaChanges/create_lots_of_schema_churn_test/] has been flapp= ing on cassci and has exposed an issue with running lots of schema alterati= ons concurrently. The failures occur on healthy clusters but seem to occur= at higher rates when 1 node is down during the alters. > The test executes the following =E2=80=93 440 total commands: > -=09Create 20 new tables > -=09Drop 7 columns one at time across 20 tables > -=09Add 7 columns one at time across 20 tables > -=09Add one column index on each of the 7 columns on 20 tables > Outcome is random. Majority of the failures are dropped columns still bei= ng present, but new columns and indexes have been observed to be incorrect.= The logs are don=E2=80=99t have exceptions and the columns/indexes that a= re incorrect don=E2=80=99t seem to follow a pattern. Running a {{nodetool = describecluster}} on each node shows the same schema id on all nodes. > Attached is a python script extracted from the dtest. Running against a = local 3 node cluster will reproduce the issue (with enough runs =E2=80=93 f= ails ~20% on my machine). > Also attached is the node logs from a run with when a dropped column (alt= er_me_7 table, column s1) is still present. Checking the system_schema tab= les for this case shows the s1 column in both the columns and drop_columns = tables. > This has been flapping on cassci on versions 2+ and doesn=E2=80=99t seem = to be related to changes in 3.0. More testing needs to be done though. > //cc [~enigmacurry] -- This message was sent by Atlassian JIRA (v6.3.4#6332)