impala-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew Jacobs (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (IMPALA-5733) Kudu tservers seem to be unresponsive after TestKuduMemLimits
Date Fri, 28 Jul 2017 15:35:00 GMT

     [ https://issues.apache.org/jira/browse/IMPALA-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Matthew Jacobs resolved IMPALA-5733.
------------------------------------
    Resolution: Fixed

We addressed this by changing the ec2 instance type. This isn't necessarily an issue with
the Kudu memory usage, but the overall memory usage of the minicluster and test processes.
 IMPALA-5737 tracks the more general issue for the longer term.

> Kudu tservers seem to be unresponsive after TestKuduMemLimits
> -------------------------------------------------------------
>
>                 Key: IMPALA-5733
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5733
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Infrastructure
>    Affects Versions: Impala 2.10.0
>            Reporter: Matthew Jacobs
>            Assignee: Matthew Jacobs
>            Priority: Critical
>              Labels: flaky-test, kudu
>             Fix For: Impala 2.10.0
>
>         Attachments: impalad1.log.tar.gz, jenkins-console.log, kudu-master.log, kudu-tserver1.log,
kudu-tserver2.log, syslog.out
>
>
> Two of [~henryr]'s gvo's for https://gerrit.cloudera.org/#/c/5715/ failed jobs after
Kudu tservers became unresponsive: gvo [1|https://jenkins.impala.io/job/gerrit-verify-dryrun/938]
[2|https://jenkins.impala.io/job/ubuntu-14.04-from-scratch/1777/]
> It looks to me like Kudu is working through the execution of
> test_kudu.py::TestKuduMemLimits::test_low_mem_limit_low_selectivity_scan
> Afterwards though, at least 1 tserver seems to become fully unresponsive or crash, though
no stack/dump seems to be generated.
> In https://jenkins.impala.io/job/ubuntu-14.04-from-scratch/1777/ these tests run fine:
> {code}
> ...
> 06:15:16 query_test/test_insert_behaviour.py::TestInsertBehaviour::test_insert_select_with_empty_resultset
PASSED
> 06:16:23 query_test/test_kudu.py::TestKuduOperations::test_kudu_alter_table[exec_option:
{'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen':
False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none]
PASSED
> 06:17:06 query_test/test_kudu.py::TestKuduMemLimits::test_low_mem_limit_low_selectivity_scan[exec_option:
{'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen':
False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none-1]
PASSED
> 06:17:10 query_test/test_kudu.py::TestKuduMemLimits::test_low_mem_limit_low_selectivity_scan[exec_option:
{'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen':
False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none-10]
PASSED
> 06:17:11 query_test/test_kudu.py::TestKuduMemLimits::test_low_mem_limit_low_selectivity_scan[exec_option:
{'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen':
False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none-0]
PASSED
> 06:17:13 query_test/test_lifecycle.py::TestFragmentLifecycleWithDebugActions::test_failure_in_prepare
PASSED
> {code}
> Shortly after (starting at 6:23), the first tserver starts reporting that other tablet
leaders are unavailable:
> {code}
> ...
> I0727 06:17:06.674669 20878 ts_tablet_manager.cc:1042] T 9bfcf775073844d0aa6251e7ee486375
P b497dbd6bd1a4c998891b00d2493a1bb: Tablet deleted. Last logged OpId: 1.1
> I0727 06:17:06.674680 20878 log.cc:974] T 9bfcf775073844d0aa6251e7ee486375 P b497dbd6bd1a4c998891b00d2493a1bb:
Deleting WAL directory at /home/ubuntu/Impala/testdata/cluster/cdh5/node-1/var/lib/kudu/ts/wal/wals/9bfcf775073844d0aa6251e7ee486375
> I0727 06:17:06.674784 20878 ts_tablet_manager.cc:1060] T 9bfcf775073844d0aa6251e7ee486375
P b497dbd6bd1a4c998891b00d2493a1bb: Deleting consensus metadata
> I0727 06:17:06.675710 20877 ts_tablet_manager.cc:1042] T bbdd11b90f804c5a94a3242a27bbe2c7
P b497dbd6bd1a4c998891b00d2493a1bb: Tablet deleted. Last logged OpId: 1.1
> I0727 06:17:06.675725 20877 log.cc:974] T bbdd11b90f804c5a94a3242a27bbe2c7 P b497dbd6bd1a4c998891b00d2493a1bb:
Deleting WAL directory at /home/ubuntu/Impala/testdata/cluster/cdh5/node-1/var/lib/kudu/ts/wal/wals/bbdd11b90f804c5a94a3242a27bbe2c7
> I0727 06:17:06.675817 20877 ts_tablet_manager.cc:1060] T bbdd11b90f804c5a94a3242a27bbe2c7
P b497dbd6bd1a4c998891b00d2493a1bb: Deleting consensus metadata
> I0727 06:23:58.746656 114414 raft_consensus.cc:411] T df6cfc0be3494e52b31b93d2298d1663
P b497dbd6bd1a4c998891b00d2493a1bb [term 1 FOLLOWER]: Starting pre-election (detected failure
of leader 6f1af300e1d549d6b1cdd8bf3b9aeb9c)
> I0727 06:23:58.744894 111995 raft_consensus.cc:411] T 3092a2a1be4e47c3aa26e260c1eea55b
P b497dbd6bd1a4c998891b00d2493a1bb [term 1 FOLLOWER]: Starting pre-election (detected failure
of leader 3a3c735705964e1badb66c37a66a9096)
> I0727 06:23:58.728032 112454 raft_consensus.cc:411] T 4b74a3b8327943648255613c164c0b03
P b497dbd6bd1a4c998891b00d2493a1bb [term 1 FOLLOWER]: Starting pre-election (detected failure
of leader 6f1af300e1d549d6b1cdd8bf3b9aeb9c)
> I0727 06:23:58.741551 114377 raft_consensus.cc:411] T ada7c413be4941979ed4e6cb659de772
P b497dbd6bd1a4c998891b00d2493a1bb [term 1 FOLLOWER]: Starting pre-election (detected failure
of leader 6f1af300e1d549d6b1cdd8bf3b9aeb9c)
> ...
> I0727 06:23:58.960822 111910 raft_consensus.cc:411] T 27a5b15a78254fa1890137a0f3df9276
P b497dbd6bd1a4c998891b00d2493a1bb [term 1 FOLLOWER]: Starting pre-election (detected failure
of leader 6f1af300e1d549d6b1cdd8bf3b9aeb9c)
> I0727 06:23:58.961027 114591 raft_consensus.cc:411] T ffec3d0d55db4cd1a5f9c9b7e0199acf
P b497dbd6bd1a4c998891b00d2493a1bb [term 1 FOLLOWER]: Starting pre-election (detected failure
of leader 6f1af300e1d549d6b1cdd8bf3b9aeb9c)
> I0727 06:23:58.961103 114665 raft_consensus.cc:411] T 1ed3de3d7ce0420880b2146cc0572329
P b497dbd6bd1a4c998891b00d2493a1bb [term 1 FOLLOWER]: Starting pre-election (detected failure
of leader 3a3c735705964e1badb66c37a66a9096)
> I0727 06:23:58.961292 112077 raft_consensus.cc:411] T 904259aacdee4b52902b54cf4a48422a
P b497dbd6bd1a4c998891b00d2493a1bb [term 1 FOLLOWER]: Starting pre-election (detected failure
of leader 6f1af300e1d549d6b1cdd8bf3b9aeb9c)
> {code}
> The other Kudu tserver logs just end at 6:17, but there is no indication that they crashed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message