Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 81577200C35 for ; Sun, 12 Mar 2017 15:27:11 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 7FF5B160B63; Sun, 12 Mar 2017 14:27:11 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A3E93160B77 for ; Sun, 12 Mar 2017 15:27:10 +0100 (CET) Received: (qmail 3033 invoked by uid 500); 12 Mar 2017 14:27:09 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 3021 invoked by uid 99); 12 Mar 2017 14:27:09 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 12 Mar 2017 14:27:09 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 0E9CF1A02B9 for ; Sun, 12 Mar 2017 14:27:09 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.451 X-Spam-Level: * X-Spam-Status: No, score=1.451 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_NEUTRAL=0.652] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id vZBHjQZO_rHH for ; Sun, 12 Mar 2017 14:27:07 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 3846B5FC63 for ; Sun, 12 Mar 2017 14:27:07 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 1DFB1E055F for ; Sun, 12 Mar 2017 14:27:05 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 787C8243A8 for ; Sun, 12 Mar 2017 14:27:04 +0000 (UTC) Date: Sun, 12 Mar 2017 14:27:04 +0000 (UTC) From: "Aleksandr Sorokoumov (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CASSANDRA-13196) test failure in snitch_test.TestGossipingPropertyFileSnitch.test_prefer_local_reconnect_on_listen_address MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Sun, 12 Mar 2017 14:27:11 -0000 [ https://issues.apache.org/jira/browse/CASSANDRA-13196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-13196: --------------------------------------------- Reviewer: Alex Petrov Status: Patch Available (was: Open) The failure in the test ("keyspace keyspace1 does not exist") happened because during the pre-bootstrap schema migration all the migration tasks failed to complete and the node was bootstrapped with schema being out of sync. {{MigrationManager.waitUntilReadyForBootstrap}} (which is invoked by {{StorageService.waitForSchema}}) just waits for the inflight tasks to finish and discards ones that take longer than {{MIGRATION_TASK_WAIT_IN_SECONDS}} to complete. Schema migration tasks are scheduled when there is a big change in an endpoint state - it joins the cluster, becomes alive or its schema version has changed. The idea is that it is safe to restart the migration task if it has timed out because either the task will succeed on one of the next retries or will be eventually killed by {{FailureDetector}} if the endpoint is marked as unreachable. AFAIU there will be at least one migration task per endpoint. With the retry mechanism {{MigrationManager.waitUntilReadyForBootstrap}} will run until migration tasks to all the reachable nodes succeed. This means that either we will receive the migration data from at least one of the nodes or all the nodes will be unreachable, but then the bootstrap is supposed to fail anyway. *Steps to reproduce* To test the retry, I commented out sending reply in {{org.apache.cassandra.schema.SchemaPullVerbHandler.doVerb}} and ran the original {{snitch_test.TestGossipingPropertyFileSnitch.test_prefer_local_reconnect_on_listen_address}} test. _NB:_ the test will run forever because without response the migration requests timeout and then being restarted. *Code* https://github.com/Gerrrr/cassandra/tree/13196-3.11 *CI builds*: * https://cassci.datastax.com/job/ifesdjeen-13196-trunk-dtest/ * https://cassci.datastax.com/job/ifesdjeen-13196-trunk-testall/ > test failure in snitch_test.TestGossipingPropertyFileSnitch.test_prefer_local_reconnect_on_listen_address > --------------------------------------------------------------------------------------------------------- > > Key: CASSANDRA-13196 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13196 > Project: Cassandra > Issue Type: Bug > Reporter: Michael Shuler > Assignee: Aleksandr Sorokoumov > Labels: dtest, test-failure > Attachments: node1_debug.log, node1_gc.log, node1.log, node2_debug.log, node2_gc.log, node2.log > > > example failure: > http://cassci.datastax.com/job/trunk_dtest/1487/testReport/snitch_test/TestGossipingPropertyFileSnitch/test_prefer_local_reconnect_on_listen_address > {code} > {novnode} > Error Message > Error from server: code=2200 [Invalid query] message="keyspace keyspace1 does not exist" > -------------------- >> begin captured logging << -------------------- > dtest: DEBUG: cluster ccm directory: /tmp/dtest-k6b0iF > dtest: DEBUG: Done setting configuration options: > { 'initial_token': None, > 'num_tokens': '32', > 'phi_convict_threshold': 5, > 'range_request_timeout_in_ms': 10000, > 'read_request_timeout_in_ms': 10000, > 'request_timeout_in_ms': 10000, > 'truncate_request_timeout_in_ms': 10000, > 'write_request_timeout_in_ms': 10000} > cassandra.policies: INFO: Using datacenter 'dc1' for DCAwareRoundRobinPolicy (via host '127.0.0.1'); if incorrect, please specify a local_dc to the constructor, or limit contact points to local cluster nodes > cassandra.cluster: INFO: New Cassandra host discovered > --------------------- >> end captured logging << --------------------- > Stacktrace > File "/usr/lib/python2.7/unittest/case.py", line 329, in run > testMethod() > File "/home/automaton/cassandra-dtest/snitch_test.py", line 87, in test_prefer_local_reconnect_on_listen_address > new_rows = list(session.execute("SELECT * FROM {}".format(stress_table))) > File "/home/automaton/src/cassandra-driver/cassandra/cluster.py", line 1998, in execute > return self.execute_async(query, parameters, trace, custom_payload, timeout, execution_profile, paging_state).result() > File "/home/automaton/src/cassandra-driver/cassandra/cluster.py", line 3784, in result > raise self._final_exception > 'Error from server: code=2200 [Invalid query] message="keyspace keyspace1 does not exist"\n-------------------- >> begin captured logging << --------------------\ndtest: DEBUG: cluster ccm directory: /tmp/dtest-k6b0iF\ndtest: DEBUG: Done setting configuration options:\n{ \'initial_token\': None,\n \'num_tokens\': \'32\',\n \'phi_convict_threshold\': 5,\n \'range_request_timeout_in_ms\': 10000,\n \'read_request_timeout_in_ms\': 10000,\n \'request_timeout_in_ms\': 10000,\n \'truncate_request_timeout_in_ms\': 10000,\n \'write_request_timeout_in_ms\': 10000}\ncassandra.policies: INFO: Using datacenter \'dc1\' for DCAwareRoundRobinPolicy (via host \'127.0.0.1\'); if incorrect, please specify a local_dc to the constructor, or limit contact points to local cluster nodes\ncassandra.cluster: INFO: New Cassandra host discovered\n--------------------- >> end captured logging << ---------------------' > {novnode} > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)