aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephan Erb (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AURORA-1809) Investigate flaky test TestRunnerKillProcessGroup.test_pg_is_killed
Date Tue, 24 Jan 2017 23:07:26 GMT

    [ https://issues.apache.org/jira/browse/AURORA-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15836806#comment-15836806
] 

Stephan Erb commented on AURORA-1809:
-------------------------------------

This test fails due to the recent introduction of {{PR_SET_CHILD_SUBREAPER}}. Only in the
full suite {{setup_child_subreaping()}} is called before the above mentioned test case is
run. If we additionally call {{setup_child_subreaping()}} from within the testcase it fails
all the time.


> Investigate flaky test TestRunnerKillProcessGroup.test_pg_is_killed
> -------------------------------------------------------------------
>
>                 Key: AURORA-1809
>                 URL: https://issues.apache.org/jira/browse/AURORA-1809
>             Project: Aurora
>          Issue Type: Bug
>            Reporter: Zameer Manji
>             Fix For: 0.17.0
>
>
> If you run it apart of the full test suite it fails like this:
> {noformat}
>  ==================== FAILURES ====================
>                      __ TestRunnerKillProcessGroup.test_pg_is_killed __
>                      
>                      self = <test_staged_kill.TestRunnerKillProcessGroup object at
0x7f0c79893e10>
>                      
>                          def test_pg_is_killed(self):
>                            runner = self.start_runner()
>                            tm = TaskMonitor(runner.tempdir, runner.task_id)
>                            self.wait_until_running(tm)
>                            process_state, run_number = tm.get_active_processes()[0]
>                            assert process_state.process == 'process'
>                            assert run_number == 0
>                          
>                            child_pidfile = os.path.join(runner.sandbox, runner.task_id,
'child.txt')
>                            while not os.path.exists(child_pidfile):
>                              time.sleep(0.1)
>                            parent_pidfile = os.path.join(runner.sandbox, runner.task_id,
'parent.txt')
>                            while not os.path.exists(parent_pidfile):
>                              time.sleep(0.1)
>                            with open(child_pidfile) as fp:
>                              child_pid = int(fp.read().rstrip())
>                            with open(parent_pidfile) as fp:
>                              parent_pid = int(fp.read().rstrip())
>                          
>                            ps = ProcessProviderFactory.get()
>                            ps.collect_all()
>                            assert parent_pid in ps.pids()
>                            assert child_pid in ps.pids()
>                            assert child_pid in ps.children_of(parent_pid)
>                          
>                            with open(os.path.join(runner.sandbox, runner.task_id,
'exit.txt'), 'w') as fp:
>                              fp.write('go away!')
>                          
>                            while tm.task_state() is not TaskState.SUCCESS:
>                              time.sleep(0.1)
>                          
>                            state = tm.get_state()
>                            assert state.processes['process'][0].state == ProcessState.SUCCESS
>                          
>                            ps.collect_all()
>                            assert parent_pid not in ps.pids()
>                      >     assert child_pid not in ps.pids()
>                      E     assert 30475 not in set([1, 2, 3, 5, 7, 8, ...])
>                      E      +  where set([1, 2, 3, 5, 7, 8, ...]) = <bound
method ProcessProvider_Procfs.pids of <twitter.common.process.process_provider_procfs.ProcessProvider_Procfs
object at 0x7f0c798b1990>>()
>                      E      +    where <bound method ProcessProvider_Procfs.pids
of <twitter.common.process.process_provider_procfs.ProcessProvider_Procfs object at 0x7f0c798b1990>>
= <twitter.common.process.process_provider_procfs.ProcessProvider_Procfs object at 0x7f0c798b1990>.pids
>                      
>                      src/test/python/apache/thermos/core/test_staged_kill.py:287: AssertionError
>                      -------------- Captured stderr call --------------
>                      WARNING:root:Could not read from checkpoint /tmp/tmp9WSRnw/checkpoints/1478305991773556-runner-base/runner
>                      WARNING:root:Could not read from checkpoint /tmp/tmp9WSRnw/checkpoints/1478305991773556-runner-base/runner
>                      WARNING:root:Could not read from checkpoint /tmp/tmp9WSRnw/checkpoints/1478305991773556-runner-base/runner
>                      WARNING:root:Could not read from checkpoint /tmp/tmp9WSRnw/checkpoints/1478305991773556-runner-base/runner
>                      WARNING:root:Could not read from checkpoint /tmp/tmp9WSRnw/checkpoints/1478305991773556-runner-base/runner
>                      WARNING:root:Could not read from checkpoint /tmp/tmp9WSRnw/checkpoints/1478305991773556-runner-base/runner
>                      WARNING:root:Could not read from checkpoint /tmp/tmp9WSRnw/checkpoints/1478305991773556-runner-base/runner
>                       generated xml file: /home/jenkins/jenkins-slave/workspace/AuroraBot/dist/test-results/415337499eb72578eab327a6487c1f5c9452b3d6.xml

>                       1 failed, 719 passed, 6 skipped, 1 warnings in 206.00
seconds 
>                      
> FAILURE
> {noformat}
> If you run the test as a one off you see this:
> {noformat}
> 00:45:32 00:00 [main]
>                (To run a reporting server: ./pants server)
> 00:45:32 00:00   [setup]
> 00:45:32 00:00     [parse]fatal: Not a git repository (or any of the parent directories):
.git
>                Executing tasks in goals: test
> 00:45:32 00:00   [test]
> 00:45:32 00:00     [test-prep-command]
> 00:45:32 00:00     [test]
> 00:45:32 00:00     [pytest]
> 00:45:32 00:00       [run]
>                      ============== test session starts ===============
>                      platform linux2 -- Python 2.7.6 -- py-1.4.31 -- pytest-2.6.4 --
/usr/bin/python2.7
>                      plugins: cov, timeout
>                      collected 83 items
>                      src/test/python/apache/thermos/core/test_staged_kill.py::TestRunnerKillProcessGroup::test_pg_is_killed
PASSED
>                       generated xml file: /home/vagrant/aurora/dist/test-results/src.test.python.apache.thermos.core.core.xml
>                       82 tests deselected by '-kTestRunnerKillProcessGroup'
>                       1 passed, 82 deselected, 1 warnings in 2.49 seconds
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message