cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksandr Sorokoumov (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-13196) test failure in snitch_test.TestGossipingPropertyFileSnitch.test_prefer_local_reconnect_on_listen_address
Date Sun, 12 Mar 2017 14:27:04 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-13196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Aleksandr Sorokoumov updated CASSANDRA-13196:
---------------------------------------------
    Reviewer: Alex Petrov
      Status: Patch Available  (was: Open)

The failure in the test ("keyspace keyspace1 does not exist") happened because during the
pre-bootstrap schema migration all the migration tasks failed to complete and the node was
bootstrapped with schema being out of sync.
{{MigrationManager.waitUntilReadyForBootstrap}} (which is invoked by {{StorageService.waitForSchema}})
just waits for the inflight tasks to finish and discards ones that take longer than {{MIGRATION_TASK_WAIT_IN_SECONDS}}
to complete.
Schema migration tasks are scheduled when there is a big change in an endpoint state - it
joins the cluster, becomes alive or its schema version has changed.

The idea is that it is safe to restart the migration task if it has timed out because either
the task will succeed on one of the next retries or will be eventually killed by {{FailureDetector}}
if the endpoint is marked as unreachable.
AFAIU there will be at least one migration task per endpoint. With the retry mechanism {{MigrationManager.waitUntilReadyForBootstrap}}
will run until migration tasks to all the reachable nodes succeed.
This means that either we will receive the migration data from at least one of the nodes or
all the nodes will be unreachable, but then the bootstrap is supposed to fail anyway.

*Steps to reproduce*

To test the retry, I commented out sending reply in {{org.apache.cassandra.schema.SchemaPullVerbHandler.doVerb}}
and ran the original {{snitch_test.TestGossipingPropertyFileSnitch.test_prefer_local_reconnect_on_listen_address}}
test.
_NB:_ the test will run forever because without response the migration requests timeout and
then being restarted.

*Code*
https://github.com/Gerrrr/cassandra/tree/13196-3.11

*CI builds*:

* https://cassci.datastax.com/job/ifesdjeen-13196-trunk-dtest/
* https://cassci.datastax.com/job/ifesdjeen-13196-trunk-testall/

> test failure in snitch_test.TestGossipingPropertyFileSnitch.test_prefer_local_reconnect_on_listen_address
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-13196
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13196
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Michael Shuler
>            Assignee: Aleksandr Sorokoumov
>              Labels: dtest, test-failure
>         Attachments: node1_debug.log, node1_gc.log, node1.log, node2_debug.log, node2_gc.log,
node2.log
>
>
> example failure:
> http://cassci.datastax.com/job/trunk_dtest/1487/testReport/snitch_test/TestGossipingPropertyFileSnitch/test_prefer_local_reconnect_on_listen_address
> {code}
> {novnode}
> Error Message
> Error from server: code=2200 [Invalid query] message="keyspace keyspace1 does not exist"
> -------------------- >> begin captured logging << --------------------
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-k6b0iF
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
>     'num_tokens': '32',
>     'phi_convict_threshold': 5,
>     'range_request_timeout_in_ms': 10000,
>     'read_request_timeout_in_ms': 10000,
>     'request_timeout_in_ms': 10000,
>     'truncate_request_timeout_in_ms': 10000,
>     'write_request_timeout_in_ms': 10000}
> cassandra.policies: INFO: Using datacenter 'dc1' for DCAwareRoundRobinPolicy (via host
'127.0.0.1'); if incorrect, please specify a local_dc to the constructor, or limit contact
points to local cluster nodes
> cassandra.cluster: INFO: New Cassandra host <Host: 127.0.0.1 dc1> discovered
> --------------------- >> end captured logging << ---------------------
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
>     testMethod()
>   File "/home/automaton/cassandra-dtest/snitch_test.py", line 87, in test_prefer_local_reconnect_on_listen_address
>     new_rows = list(session.execute("SELECT * FROM {}".format(stress_table)))
>   File "/home/automaton/src/cassandra-driver/cassandra/cluster.py", line 1998, in execute
>     return self.execute_async(query, parameters, trace, custom_payload, timeout, execution_profile,
paging_state).result()
>   File "/home/automaton/src/cassandra-driver/cassandra/cluster.py", line 3784, in result
>     raise self._final_exception
> 'Error from server: code=2200 [Invalid query] message="keyspace keyspace1 does not exist"\n--------------------
>> begin captured logging << --------------------\ndtest: DEBUG: cluster ccm directory:
/tmp/dtest-k6b0iF\ndtest: DEBUG: Done setting configuration options:\n{   \'initial_token\':
None,\n    \'num_tokens\': \'32\',\n    \'phi_convict_threshold\': 5,\n    \'range_request_timeout_in_ms\':
10000,\n    \'read_request_timeout_in_ms\': 10000,\n    \'request_timeout_in_ms\': 10000,\n
   \'truncate_request_timeout_in_ms\': 10000,\n    \'write_request_timeout_in_ms\': 10000}\ncassandra.policies:
INFO: Using datacenter \'dc1\' for DCAwareRoundRobinPolicy (via host \'127.0.0.1\'); if incorrect,
please specify a local_dc to the constructor, or limit contact points to local cluster nodes\ncassandra.cluster:
INFO: New Cassandra host <Host: 127.0.0.1 dc1> discovered\n--------------------- >>
end captured logging << ---------------------'
> {novnode}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message