mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Baskar Sikkayan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-7942) Mesos slave - docker job exits normally but reporting as TASK_FAILED
Date Thu, 07 Sep 2017 04:13:01 GMT

    [ https://issues.apache.org/jira/browse/MESOS-7942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156421#comment-16156421
] 

Baskar Sikkayan commented on MESOS-7942:
----------------------------------------

docker status is Exited (0)

docker ps -a


    CONTAINER ID        IMAGE                  COMMAND    CREATED      
                                                                                         
 STATUS                         PORTS               NAMES
    789ec59e32442        docker-test:latest   "/bin/sh -c 'java -ja"   2 minutes ago     
 Exited (0) 2 minutes ago                           mesos-05fb536b-asdff-3d634c4ed860-S1.be003d0e-7701-4020-84be-234643565244

> Mesos slave - docker job exits normally but reporting as TASK_FAILED
> --------------------------------------------------------------------
>
>                 Key: MESOS-7942
>                 URL: https://issues.apache.org/jira/browse/MESOS-7942
>             Project: Mesos
>          Issue Type: Bug
>          Components: agent, docker
>    Affects Versions: 1.1.0, 1.2.1, 1.3.1
>         Environment: Kernel | OS | Snapshot:               3.8.13-98.7.1.el7uek | OL
7.3 | 7-2017.6.4
>            Reporter: Baskar Sikkayan
>
> Mesos version - 1.2.1.
> Jobs are being scheduled using Chronos. Docker job is being invoked properly, but still
getting TASK_FAILED error even it completes with exit status ZERO.
> Mesos slave logs :-
> {code}
> I0906 04:15:03.311928    10 slave.cpp:1785] Launching task 'ct:1504671300002:0:Job_Task_Test:'
for framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000
> I0906 04:15:03.314584    10 paths.cpp:547] Trying to chown ' /mesos-data/slave-2/slaves/f20ab78e-acd3-407a-b1b6-47d67a947eff-S1/frameworks/5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000/executors/ct:1504671300002:0:Job_Task_Test:/runs/7cd8bd78-b20d-4db5-8435-4d1420cb1b93'
to user 'root'
> I0906 04:15:03.315140    10 slave.cpp:6479] Launching executor 'ct:1504671300002:0:Job_Task_Test:'
of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000 with resources cpus(*)(allocated: *):0.1;
mem(*)(allocated: *):32 in work directory ' /mesos-data/slave-2/slaves/f20ab78e-acd3-407a-b1b6-47d67a947eff-S1/frameworks/5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000/executors/ct:1504671300002:0:Job_Task_Test:/runs/7cd8bd78-b20d-4db5-8435-4d1420cb1b93'
> I0906 04:15:03.315809    10 slave.cpp:2118] Queued task 'ct:1504671300002:0:Job_Task_Test:'
for executor 'ct:1504671300002:0:Job_Task_Test:' of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000
> I0906 04:15:03.316238    12 docker.cpp:1165] Starting container '7cd8bd78-b20d-4db5-8435-4d1420cb1b93'
for task 'ct:1504671300002:0:Job_Task_Test:' (and executor 'ct:1504671300002:0:Job_Task_Test:')
of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000
> I0906 04:15:03.612807    10 docker.cpp:803] Checkpointing pid 248 to ' /mesos-data/slave-2/meta/slaves/f20ab78e-acd3-407a-b1b6-47d67a947eff-S1/frameworks/5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000/executors/ct:1504671300002:0:Job_Task_Test:/runs/7cd8bd78-b20d-4db5-8435-4d1420cb1b93/pids/forked.pid'
> I0906 04:15:03.649960    10 slave.cpp:3385] Got registration for executor 'ct:1504671300002:0:Job_Task_Test:'
of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000 from executor(1)@20.403.68.700:38740
> I0906 04:15:03.650584    11 docker.cpp:1608] Ignoring updating container 7cd8bd78-b20d-4db5-8435-4d1420cb1b93
because resources passed to update are identical to existing resources
> I0906 04:15:03.650701    11 slave.cpp:2331] Sending queued task 'ct:1504671300002:0:Job_Task_Test:'
to executor 'ct:1504671300002:0:Job_Task_Test:' of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000
at executor(1)@20.403.68.700:38740
> I0906 04:15:05.255101    10 slave.cpp:3816] Handling status update TASK_RUNNING (UUID:
35a9a010-d623-45a3-9d1e-bdbc6942129f) for task ct:1504671300002:0:Job_Task_Test: of framework
5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000 from executor(1)@20.403.68.700:38740
> I0906 04:15:05.255280    10 status_update_manager.cpp:323] Received status update TASK_RUNNING
(UUID: 35a9a010-d623-45a3-9d1e-bdbc6942129f) for task ct:1504671300002:0:Job_Task_Test: of
framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000
> I0906 04:15:05.255551    10 status_update_manager.cpp:832] Checkpointing UPDATE for status
update TASK_RUNNING (UUID: 35a9a010-d623-45a3-9d1e-bdbc6942129f) for task ct:1504671300002:0:Job_Task_Test:
of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000
> I0906 04:15:05.255697     9 slave.cpp:4256] Forwarding the update TASK_RUNNING (UUID:
35a9a010-d623-45a3-9d1e-bdbc6942129f) for task ct:1504671300002:0:Job_Task_Test: of framework
5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000 to master@20.403.68.700:5050
> I0906 04:15:05.255803     9 slave.cpp:4166] Sending acknowledgement for status update
TASK_RUNNING (UUID: 35a9a010-d623-45a3-9d1e-bdbc6942129f) for task ct:1504671300002:0:Job_Task_Test:
of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000 to executor(1)@20.403.68.700:38740
> I0906 04:15:05.260083    10 status_update_manager.cpp:395] Received status update acknowledgement
(UUID: 35a9a010-d623-45a3-9d1e-bdbc6942129f) for task ct:1504671300002:0:Job_Task_Test: of
framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000
> I0906 04:15:05.260114    10 status_update_manager.cpp:832] Checkpointing ACK for status
update TASK_RUNNING (UUID: 35a9a010-d623-45a3-9d1e-bdbc6942129f) for task ct:1504671300002:0:Job_Task_Test:
of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000
> I0906 04:15:13.090368    13 slave.cpp:3816] *{color:#f6c342}Handling status update TASK_FAILED{color}*
(UUID: 62ecb989-f260-42b0-ba99-d22fa7210a4c) for task ct:1504671300002:0:Job_Task_Test: of
framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000 from executor(1)@20.403.68.700:38740
> I0906 04:15:13.164096    13 status_update_manager.cpp:323] Received status update TASK_FAILED
(UUID: 62ecb989-f260-42b0-ba99-d22fa7210a4c) for task ct:1504671300002:0:Job_Task_Test: of
framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000
> I0906 04:15:13.164135    13 status_update_manager.cpp:832] Checkpointing UPDATE for status
update TASK_FAILED (UUID: 62ecb989-f260-42b0-ba99-d22fa7210a4c) for task ct:1504671300002:0:Job_Task_Test:
of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000
> I0906 04:15:13.164289    10 slave.cpp:4256] Forwarding the update TASK_FAILED (UUID:
62ecb989-f260-42b0-ba99-d22fa7210a4c) for task ct:1504671300002:0:Job_Task_Test: of framework
5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000 to master@20.403.68.700:5050
> I0906 04:15:13.164397    10 slave.cpp:4166] Sending acknowledgement for status update
TASK_FAILED (UUID: 62ecb989-f260-42b0-ba99-d22fa7210a4c) for task ct:1504671300002:0:Job_Task_Test:
of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000 to executor(1)@20.403.68.700:38740
> I0906 04:15:13.172888    12 status_update_manager.cpp:395] Received status update acknowledgement
(UUID: 62ecb989-f260-42b0-ba99-d22fa7210a4c) for task ct:1504671300002:0:Job_Task_Test: of
framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000
> I0906 04:15:13.172940    12 status_update_manager.cpp:832] Checkpointing ACK for status
update TASK_FAILED (UUID: 62ecb989-f260-42b0-ba99-d22fa7210a4c) for task ct:1504671300002:0:Job_Task_Test:
of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000
> I0906 04:15:14.092870    11 slave.cpp:4388] Got exited event for executor(1)@20.403.68.700:38740
> I0906 04:15:14.168128    11 docker.cpp:2397] Executor for container 7cd8bd78-b20d-4db5-8435-4d1420cb1b93
has exited
> I0906 04:15:14.168166    11 docker.cpp:2091] Destroying container 7cd8bd78-b20d-4db5-8435-4d1420cb1b93
> I0906 04:15:14.168196    11 docker.cpp:2218] Running docker stop on container 7cd8bd78-b20d-4db5-8435-4d1420cb1b93
> I0906 04:15:14.170940    15 slave.cpp:4768] *Executor 'ct:1504671300002:0:Job_Task_Test:'
of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000 exited with status 0*
> I0906 04:15:14.170967    15 slave.cpp:4868] Cleaning up executor 'ct:1504671300002:0:Job_Task_Test:'
of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000 at executor(1)@20.403.68.700:38740
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message