mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gilbert Song (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MESOS-8004) Failed to kill all processes in the container due to cgroup freeze failure
Date Fri, 22 Sep 2017 07:37:00 GMT

     [ https://issues.apache.org/jira/browse/MESOS-8004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Gilbert Song updated MESOS-8004:
--------------------------------
         Labels: launcher  (was: )
    Component/s: containerization

> Failed to kill all processes in the container due to cgroup freeze failure
> --------------------------------------------------------------------------
>
>                 Key: MESOS-8004
>                 URL: https://issues.apache.org/jira/browse/MESOS-8004
>             Project: Mesos
>          Issue Type: Bug
>          Components: agent, containerization
>    Affects Versions: 1.2.1
>         Environment: CentOS Linux release 7.2.1511 (Core) 
> 3.10.0-327.36.3.el7.x86_64
>            Reporter: Haiwei Zhou
>              Labels: launcher
>
> When using Mesos unified container, executor can not be destroyed because cgroup freeze
operation failed. The logs from agent show that launcher tries to freeze cgroup several times,
then timeout occurs. However, the content of /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8/freezer.state
is "FROZEN".
> {quote}
> I0921 18:00:58.339440  3493 containerizer.cpp:2465] Container e2778ccd-c7e5-4289-b382-e05f063200d8
has exited
> I0921 18:00:58.339519  3493 containerizer.cpp:2102] Destroying container e2778ccd-c7e5-4289-b382-e05f063200d8
in RUNNING state
> I0921 18:00:58.339645  3484 linux_launcher.cpp:505] Asked to destroy container e2778ccd-c7e5-4289-b382-e05f063200d8
> I0921 18:00:58.340553  3484 linux_launcher.cpp:548] Using freezer to destroy cgroup mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
> I0921 18:00:58.342226  3493 cgroups.cpp:2692] Freezing cgroup /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
> I0921 18:01:00.042708  3475 slave.cpp:5155] Killing executor '47eb9350-9ab4-41f8-a5cd-39e855532b53'
of framework 23aad131-26f7-44fd-9baa-dfb55e3e3926-0110 at executor(1)@172.29.0.18:40108
> I0921 18:01:02.009097  3483 process.cpp:3704] Handling HTTP event for process 'slave(1)'
with path: '/slave(1)/containers'
> W0921 18:01:02.011672  3491 containerizer.cpp:2055] Skipping status for container e2778ccd-c7e5-4289-b382-e05f063200d8
because: Container does not exist
> I0921 18:01:04.269701  3487 slave.cpp:5732] Querying resource estimator for oversubscribable
resources
> I0921 18:01:04.269775  3487 slave.cpp:5266] Current disk usage 0.11%. Max allowed age:
6.292478769607581days
> I0921 18:01:04.270349  3506 slave.cpp:5746] Received oversubscribable resources {} from
the resource estimator
> I0921 18:01:08.300772  3474 slave.cpp:4346] Received ping from slave-observer(30)@10.16.85.66:5050
> I0921 18:01:08.345176  3517 cgroups.cpp:2710] Thawing cgroup /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
> I0921 18:01:08.347452  3517 cgroups.cpp:1434] Successfully thawed cgroup /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
after 2.183168ms
> I0921 18:01:08.347561  3517 cgroups.cpp:2692] Freezing cgroup /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
> E0921 18:01:15.192441  3524 perf_event.cpp:176] Perf sample of 10secs failed to complete
within 12secs; sampling will be halted
> E0921 18:01:15.192819  3489 perf_event.cpp:199] Failed to get the perf sample: timeout
> I0921 18:01:18.350342  3488 cgroups.cpp:2710] Thawing cgroup /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
> I0921 18:01:18.352532  3488 cgroups.cpp:1434] Successfully thawed cgroup /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
after 2.121984ms
> I0921 18:01:18.352646  3481 cgroups.cpp:2692] Freezing cgroup /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
> I0921 18:01:19.301443  3520 slave.cpp:5732] Querying resource estimator for oversubscribable
resources
> I0921 18:01:19.301566  3501 slave.cpp:5746] Received oversubscribable resources {} from
the resource estimator
> I0921 18:01:23.307291  3518 slave.cpp:4346] Received ping from slave-observer(30)@10.16.85.66:5050
> I0921 18:01:28.121094  3491 process.cpp:3704] Handling HTTP event for process 'metrics'
with path: '/metrics/snapshot'
> I0921 18:01:28.355551  3493 cgroups.cpp:2710] Thawing cgroup /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
> I0921 18:01:28.357792  3493 cgroups.cpp:1434] Successfully thawed cgroup /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
after 2.177024ms
> I0921 18:01:28.357890  3493 cgroups.cpp:2692] Freezing cgroup /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
> I0921 18:01:34.302625  3503 slave.cpp:5732] Querying resource estimator for oversubscribable
resources
> I0921 18:01:34.302738  3483 slave.cpp:5746] Received oversubscribable resources {} from
the resource estimator
> I0921 18:01:38.315979  3505 slave.cpp:4346] Received ping from slave-observer(30)@10.16.85.66:5050
> I0921 18:01:38.360709  3511 cgroups.cpp:2710] Thawing cgroup /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
> I0921 18:01:38.362891  3511 cgroups.cpp:1434] Successfully thawed cgroup /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
after 2.12608ms
> I0921 18:01:38.362993  3475 cgroups.cpp:2692] Freezing cgroup /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
> I0921 18:01:48.366251  3492 cgroups.cpp:2710] Thawing cgroup /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
> I0921 18:01:48.368404  3496 cgroups.cpp:1434] Successfully thawed cgroup /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
after 2.080256ms
> I0921 18:01:48.368501  3496 cgroups.cpp:2692] Freezing cgroup /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
> E0921 18:01:58.342779  3478 slave.cpp:4746] Termination of executor '47eb9350-9ab4-41f8-a5cd-39e855532b53'
of framework 23aad131-26f7-44fd-9baa-dfb55e3e3926-0110 failed: Failed to kill all processes
in the container: Timed out after 1mins
> I0921 18:01:58.342830  3478 slave.cpp:4868] Cleaning up executor '47eb9350-9ab4-41f8-a5cd-39e855532b53'
of framework 23aad131-26f7-44fd-9baa-dfb55e3e3926-0110 at executor(1)@172.29.0.18:40108
> I0921 18:01:58.364516  3475 gc.cpp:55] Scheduling '/data/mesos/slaves/23aad131-26f7-44fd-9baa-dfb55e3e3926-S5/frameworks/23aad131-26f7-44fd-9baa-dfb55e3e3926-0110/executors/47eb9350-9ab4-41f8-a5cd-39e855532b53/runs/e2778ccd-c7e5-4289-b382-e05f063200d8'
for gc 6.99999578195556days in the future
> I0921 18:01:58.364591  3475 gc.cpp:55] Scheduling '/data/mesos/slaves/23aad131-26f7-44fd-9baa-dfb55e3e3926-S5/frameworks/23aad131-26f7-44fd-9baa-dfb55e3e3926-0110/executors/47eb9350-9ab4-41f8-a5cd-39e855532b53'
for gc 6.9999957811437days in the future
> I0921 18:01:58.364604  3478 slave.cpp:4956] Cleaning up framework 23aad131-26f7-44fd-9baa-dfb55e3e3926-0110
> I0921 18:01:58.364615  3475 gc.cpp:55] Scheduling '/data/mesos/meta/slaves/23aad131-26f7-44fd-9baa-dfb55e3e3926-S5/frameworks/23aad131-26f7-44fd-9baa-dfb55e3e3926-0110/executors/47eb9350-9ab4-41f8-a5cd-39e855532b53/runs/e2778ccd-c7e5-4289-b382-e05f063200d8'
for gc 6.99999578062519days in the future
> I0921 18:01:58.364670  3475 gc.cpp:55] Scheduling '/data/mesos/meta/slaves/23aad131-26f7-44fd-9baa-dfb55e3e3926-S5/frameworks/23aad131-26f7-44fd-9baa-dfb55e3e3926-0110/executors/47eb9350-9ab4-41f8-a5cd-39e855532b53'
for gc 6.99999578024296days in the future
> I0921 18:01:58.364683  3479 status_update_manager.cpp:285] Closing status update streams
for framework 23aad131-26f7-44fd-9baa-dfb55e3e3926-0110
> I0921 18:01:58.364702  3475 gc.cpp:55] Scheduling '/data/mesos/slaves/23aad131-26f7-44fd-9baa-dfb55e3e3926-S5/frameworks/23aad131-26f7-44fd-9baa-dfb55e3e3926-0110'
for gc 6.9999957791437days in the future
> I0921 18:01:58.364725  3479 status_update_manager.cpp:531] Cleaning up status update
stream for task 47eb9350-9ab4-41f8-a5cd-39e855532b53 of framework 23aad131-26f7-44fd-9baa-dfb55e3e3926-0110
> I0921 18:01:58.364740  3475 gc.cpp:55] Scheduling '/data/mesos/meta/slaves/23aad131-26f7-44fd-9baa-dfb55e3e3926-S5/frameworks/23aad131-26f7-44fd-9baa-dfb55e3e3926-0110'
for gc 6.99999577881778days in the future
> {quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message