cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kurt Greaves (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-13209) test failure in cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest.test_bulk_round_trip_blogposts_with_max_connections
Date Fri, 19 May 2017 01:50:04 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-13209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16016784#comment-16016784
] 

Kurt Greaves commented on CASSANDRA-13209:
------------------------------------------

actually it's always less, and it appears the majority of the issues come from the COPY FROM.
Not every test failure seems to be caused by the same thing, but the majority appear to be
because when timeouts occur in the COPY FROM they don't actually get retried, so some rows
don't get written. Also once the COPY FROM code in cqlsh gets over 1000 failed rows it exits,
which kind of explains why a lot of the failures are because of a difference of a 1000 rows.

Corresponding error is the following, note it mentions the # of attempts, however it will
never go past 1.
{code}<stdin>:2:Failed to import 9 rows: OperationTimedOut - errors={'127.0.0.3': 'Client
request timeout. See Session.execute[_async](timeout)'}, last_host=127.0.0.3,  will retry
later, attempt 1 of 5{code}

Going to fix the error handling in the COPY FROM command to actually do retries (like COPY
TO does), that should make the tests less flaky, as COPY TO also suffers from the timeouts
but effectively retries and thus rarely has issues. There is still the underlying problem
of why COPY is timing out in the first place, but to be honest I'd put it down to the command
and nodes simply using too much resources on the servers. If it's still very flaky after fixing
the retries we can look into performance issues more.



> test failure in cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest.test_bulk_round_trip_blogposts_with_max_connections
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-13209
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13209
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Michael Shuler
>            Assignee: Kurt Greaves
>              Labels: dtest, test-failure
>         Attachments: node1.log, node2.log, node3.log, node4.log, node5.log
>
>
> example failure:
> http://cassci.datastax.com/job/cassandra-2.1_dtest/528/testReport/cqlsh_tests.cqlsh_copy_tests/CqlshCopyTest/test_bulk_round_trip_blogposts_with_max_connections
> {noformat}
> Error Message
> errors={'127.0.0.4': 'Client request timeout. See Session.execute[_async](timeout)'},
last_host=127.0.0.4
> -------------------- >> begin captured logging << --------------------
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-792s6j
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
>     'num_tokens': '32',
>     'phi_convict_threshold': 5,
>     'range_request_timeout_in_ms': 10000,
>     'read_request_timeout_in_ms': 10000,
>     'request_timeout_in_ms': 10000,
>     'truncate_request_timeout_in_ms': 10000,
>     'write_request_timeout_in_ms': 10000}
> dtest: DEBUG: removing ccm cluster test at: /tmp/dtest-792s6j
> dtest: DEBUG: clearing ssl stores from [/tmp/dtest-792s6j] directory
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-uNMsuW
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
>     'num_tokens': '32',
>     'phi_convict_threshold': 5,
>     'range_request_timeout_in_ms': 10000,
>     'read_request_timeout_in_ms': 10000,
>     'request_timeout_in_ms': 10000,
>     'truncate_request_timeout_in_ms': 10000,
>     'write_request_timeout_in_ms': 10000}
> cassandra.policies: INFO: Using datacenter 'datacenter1' for DCAwareRoundRobinPolicy
(via host '127.0.0.1'); if incorrect, please specify a local_dc to the constructor, or limit
contact points to local cluster nodes
> cassandra.cluster: INFO: New Cassandra host <Host: 127.0.0.3 datacenter1> discovered
> cassandra.cluster: INFO: New Cassandra host <Host: 127.0.0.2 datacenter1> discovered
> cassandra.cluster: INFO: New Cassandra host <Host: 127.0.0.5 datacenter1> discovered
> cassandra.cluster: INFO: New Cassandra host <Host: 127.0.0.4 datacenter1> discovered
> dtest: DEBUG: Running stress with user profile /home/automaton/cassandra-dtest/cqlsh_tests/blogposts.yaml
> --------------------- >> end captured logging << ---------------------
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
>     testMethod()
>   File "/home/automaton/cassandra-dtest/dtest.py", line 1090, in wrapped
>     f(obj)
>   File "/home/automaton/cassandra-dtest/cqlsh_tests/cqlsh_copy_tests.py", line 2571,
in test_bulk_round_trip_blogposts_with_max_connections
>     copy_from_options={'NUMPROCESSES': 2})
>   File "/home/automaton/cassandra-dtest/cqlsh_tests/cqlsh_copy_tests.py", line 2500,
in _test_bulk_round_trip
>     num_records = create_records()
>   File "/home/automaton/cassandra-dtest/cqlsh_tests/cqlsh_copy_tests.py", line 2473,
in create_records
>     ret = rows_to_list(self.session.execute(count_statement))[0][0]
>   File "/home/automaton/src/cassandra-driver/cassandra/cluster.py", line 1998, in execute
>     return self.execute_async(query, parameters, trace, custom_payload, timeout, execution_profile,
paging_state).result()
>   File "/home/automaton/src/cassandra-driver/cassandra/cluster.py", line 3784, in result
>     raise self._final_exception
> "errors={'127.0.0.4': 'Client request timeout. See Session.execute[_async](timeout)'},
last_host=127.0.0.4\n-------------------- >> begin captured logging << --------------------\ndtest:
DEBUG: cluster ccm directory: /tmp/dtest-792s6j\ndtest: DEBUG: Done setting configuration
options:\n{   'initial_token': None,\n    'num_tokens': '32',\n    'phi_convict_threshold':
5,\n    'range_request_timeout_in_ms': 10000,\n    'read_request_timeout_in_ms': 10000,\n
   'request_timeout_in_ms': 10000,\n    'truncate_request_timeout_in_ms': 10000,\n    'write_request_timeout_in_ms':
10000}\ndtest: DEBUG: removing ccm cluster test at: /tmp/dtest-792s6j\ndtest: DEBUG: clearing
ssl stores from [/tmp/dtest-792s6j] directory\ndtest: DEBUG: cluster ccm directory: /tmp/dtest-uNMsuW\ndtest:
DEBUG: Done setting configuration options:\n{   'initial_token': None,\n    'num_tokens':
'32',\n    'phi_convict_threshold': 5,\n    'range_request_timeout_in_ms': 10000,\n    'read_request_timeout_in_ms':
10000,\n    'request_timeout_in_ms': 10000,\n    'truncate_request_timeout_in_ms': 10000,\n
   'write_request_timeout_in_ms': 10000}\ncassandra.policies: INFO: Using datacenter 'datacenter1'
for DCAwareRoundRobinPolicy (via host '127.0.0.1'); if incorrect, please specify a local_dc
to the constructor, or limit contact points to local cluster nodes\ncassandra.cluster: INFO:
New Cassandra host <Host: 127.0.0.3 datacenter1> discovered\ncassandra.cluster: INFO:
New Cassandra host <Host: 127.0.0.2 datacenter1> discovered\ncassandra.cluster: INFO:
New Cassandra host <Host: 127.0.0.5 datacenter1> discovered\ncassandra.cluster: INFO:
New Cassandra host <Host: 127.0.0.4 datacenter1> discovered\ndtest: DEBUG: Running stress
with user profile /home/automaton/cassandra-dtest/cqlsh_tests/blogposts.yaml\n---------------------
>> end captured logging << ---------------------"
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message