mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Neil Conway (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MESOS-6974) DefaultExecutorTest.CommitSuicideOnTaskFailure test is flaky.
Date Tue, 23 May 2017 18:02:04 GMT

     [ https://issues.apache.org/jira/browse/MESOS-6974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Neil Conway updated MESOS-6974:
-------------------------------
    Priority: Major  (was: Critical)

Downgrading from "Critical" to "Major" -- AFAICS this is a flaky test and should be fixed,
but isn't more serious than other known flaky tests (of which there are unfortunately quite
a few).

> DefaultExecutorTest.CommitSuicideOnTaskFailure test is flaky.
> -------------------------------------------------------------
>
>                 Key: MESOS-6974
>                 URL: https://issues.apache.org/jira/browse/MESOS-6974
>             Project: Mesos
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 1.1.0
>         Environment: Mac OS 10.11.6 with clang-703.0.31
>            Reporter: Alexander Rukletsov
>            Assignee: Anand Mazumdar
>              Labels: flaky-test
>         Attachments: default_executor_tests.txt
>
>
> This test seems to be racy. For some reason the shutdown process in the default executor
stalls. Sometimes the executor manages to quit (well, segfault) before the agent tries to
resend the last task status update, but sometimes not, which leads to the test failure. It
seems that the executor should not hang during termination, which may indicate a bug in the
executor and not just in the test.
> {noformat}
> I0123 11:52:29.001549 3211264 master.cpp:5855] Status update TASK_FAILED (UUID: 699ee239-4eae-4b51-a68c-80e56dfd01dd)
for task 5cfe9ce6-f53b-4906-bb76-3ca6179489bc of framework 3c207374-2ca5-4e9a-a138-dcb2eabb848e-0000
from agent 3c207374-2ca5-4e9a-a138-dcb2eabb848e-S0 at slave(5)@192.168.9.40:60268 (alexr)
> I0123 11:52:29.001581 3211264 master.cpp:5917] Forwarding status update TASK_FAILED (UUID:
699ee239-4eae-4b51-a68c-80e56dfd01dd) for task 5cfe9ce6-f53b-4906-bb76-3ca6179489bc of framework
3c207374-2ca5-4e9a-a138-dcb2eabb848e-0000
> I0123 11:52:29.001713 3211264 master.cpp:7956] Updating the state of task 5cfe9ce6-f53b-4906-bb76-3ca6179489bc
of framework 3c207374-2ca5-4e9a-a138-dcb2eabb848e-0000 (latest state: TASK_FAILED, status
update state: TASK_FAILED)
> I0123 11:52:29.002049 528384 hierarchical.cpp:1011] Recovered cpus(*):0.1; mem(*):32;
disk(*):32 (total: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000], allocated:
cpus(*):0.1; mem(*):32; disk(*):32) on agent 3c207374-2ca5-4e9a-a138-dcb2eabb848e-S0 from
framework 3c207374-2ca5-4e9a-a138-dcb2eabb848e-0000
> I0123 11:52:29.002229 4284416 scheduler.cpp:676] Enqueuing event UPDATE received from
http://192.168.9.40:60268/master/api/v1/scheduler
> I0123 11:52:29.784299 3211264 hierarchical.cpp:1772] No inverse offers to send out!
> I0123 11:52:29.784381 3211264 hierarchical.cpp:1279] Performed allocation for 1 agents
in 726us
> I0123 11:52:29.784638 528384 master.cpp:6671] Sending 1 offers to framework 3c207374-2ca5-4e9a-a138-dcb2eabb848e-0000
(default)
> I0123 11:52:29.785650 4284416 scheduler.cpp:676] Enqueuing event OFFERS received from
http://192.168.9.40:60268/master/api/v1/scheduler
> I0123 11:52:30.003669 3211264 default_executor.cpp:693] Shutting down
> E0123 11:52:30.004431 4820992 process.cpp:2419] Failed to shutdown socket with fd 13:
Socket is not connected
> E0123 11:52:30.005080 4820992 process.cpp:2419] Failed to shutdown socket with fd 11:
Socket is not connected
> E0123 11:52:30.005573 4820992 process.cpp:2419] Failed to shutdown socket with fd 12:
Socket is not connected
> W0123 11:52:30.005645 2138112 process.cpp:3022] Attempted to spawn a process (__shutdown_executor__(1)@192.168.9.40:60313)
after finalizing libprocess!
> W0123 11:52:30.005695 2138112 process.cpp:3022] Attempted to spawn a process (__async_executor__(6)@192.168.9.40:60313)
after finalizing libprocess!
> E0123 11:52:30.005971 4820992 process.cpp:2419] Failed to shutdown socket with fd 14:
Socket is not connected
> I0123 11:52:30.789027 528384 hierarchical.cpp:1677] No allocations performed
> I0123 11:52:30.789062 528384 hierarchical.cpp:1772] No inverse offers to send out!
> I0123 11:52:30.789083 528384 hierarchical.cpp:1279] Performed allocation for 1 agents
in 110us
> I0123 11:52:31.793439 4284416 hierarchical.cpp:1677] No allocations performed
> I0123 11:52:31.793472 4284416 hierarchical.cpp:1772] No inverse offers to send out!
> I0123 11:52:31.793485 4284416 hierarchical.cpp:1279] Performed allocation for 1 agents
in 99us
> I0123 11:52:32.797495 3211264 hierarchical.cpp:1677] No allocations performed
> I0123 11:52:32.797535 3211264 hierarchical.cpp:1772] No inverse offers to send out!
> I0123 11:52:32.797554 3211264 hierarchical.cpp:1279] Performed allocation for 1 agents
in 120us
> I0123 11:52:33.798820 4284416 hierarchical.cpp:1677] No allocations performed
> I0123 11:52:33.798849 4284416 hierarchical.cpp:1772] No inverse offers to send out!
> I0123 11:52:33.798862 4284416 hierarchical.cpp:1279] Performed allocation for 1 agents
in 91us
> I0123 11:52:34.801596 3747840 hierarchical.cpp:1677] No allocations performed
> I0123 11:52:34.801638 3747840 hierarchical.cpp:1772] No inverse offers to send out!
> I0123 11:52:34.801659 3747840 hierarchical.cpp:1279] Performed allocation for 1 agents
in 134us
> I0123 11:52:35.804436 2674688 hierarchical.cpp:1677] No allocations performed
> I0123 11:52:35.804479 2674688 hierarchical.cpp:1772] No inverse offers to send out!
> I0123 11:52:35.804500 2674688 hierarchical.cpp:1279] Performed allocation for 1 agents
in 148us
> I0123 11:52:36.808641 3747840 hierarchical.cpp:1677] No allocations performed
> I0123 11:52:36.808677 3747840 hierarchical.cpp:1772] No inverse offers to send out!
> I0123 11:52:36.808696 3747840 hierarchical.cpp:1279] Performed allocation for 1 agents
in 115us
> I0123 11:52:37.812849 2674688 hierarchical.cpp:1677] No allocations performed
> I0123 11:52:37.812885 2674688 hierarchical.cpp:1772] No inverse offers to send out!
> I0123 11:52:37.812904 2674688 hierarchical.cpp:1279] Performed allocation for 1 agents
in 134us
> I0123 11:52:38.817015 3747840 hierarchical.cpp:1677] No allocations performed
> I0123 11:52:38.817044 3747840 hierarchical.cpp:1772] No inverse offers to send out!
> I0123 11:52:38.817059 3747840 hierarchical.cpp:1279] Performed allocation for 1 agents
in 92us
> W0123 11:52:39.002764 1064960 status_update_manager.cpp:478] Resending status update
TASK_FAILED (UUID: 699ee239-4eae-4b51-a68c-80e56dfd01dd) for task 5cfe9ce6-f53b-4906-bb76-3ca6179489bc
of framework 3c207374-2ca5-4e9a-a138-dcb2eabb848e-0000
> I0123 11:52:39.002830 1064960 status_update_manager.cpp:377] Forwarding update TASK_FAILED
(UUID: 699ee239-4eae-4b51-a68c-80e56dfd01dd) for task 5cfe9ce6-f53b-4906-bb76-3ca6179489bc
of framework 3c207374-2ca5-4e9a-a138-dcb2eabb848e-0000 to the agent
> I0123 11:52:39.002985 2138112 slave.cpp:4196] Forwarding the update TASK_FAILED (UUID:
699ee239-4eae-4b51-a68c-80e56dfd01dd) for task 5cfe9ce6-f53b-4906-bb76-3ca6179489bc of framework
3c207374-2ca5-4e9a-a138-dcb2eabb848e-0000 to master@192.168.9.40:60268
> I0123 11:52:39.003178 3211264 master.cpp:5855] Status update TASK_FAILED (UUID: 699ee239-4eae-4b51-a68c-80e56dfd01dd)
for task 5cfe9ce6-f53b-4906-bb76-3ca6179489bc of framework 3c207374-2ca5-4e9a-a138-dcb2eabb848e-0000
from agent 3c207374-2ca5-4e9a-a138-dcb2eabb848e-S0 at slave(5)@192.168.9.40:60268 (alexr)
> I0123 11:52:39.003211 3211264 master.cpp:5917] Forwarding status update TASK_FAILED (UUID:
699ee239-4eae-4b51-a68c-80e56dfd01dd) for task 5cfe9ce6-f53b-4906-bb76-3ca6179489bc of framework
3c207374-2ca5-4e9a-a138-dcb2eabb848e-0000
> I0123 11:52:39.003393 3211264 master.cpp:7956] Updating the state of task 5cfe9ce6-f53b-4906-bb76-3ca6179489bc
of framework 3c207374-2ca5-4e9a-a138-dcb2eabb848e-0000 (latest state: TASK_FAILED, status
update state: TASK_FAILED)
> I0123 11:52:39.004077 1064960 scheduler.cpp:676] Enqueuing event UPDATE received from
http://192.168.9.40:60268/master/api/v1/scheduler
> ../../../src/tests/default_executor_tests.cpp:930: Failure
> Mock function called more times than expected - returning directly.
>     Function call: update(0x7fff53295420, @0x7fbe51902530 32-byte object <D0-04 70-15
01-00 00-00 00-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 50-25 90-51 BE-7F 00-00>)
>          Expected: to be called twice
>            Actual: called 3 times - over-saturated and active
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message