mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Armand Grillet (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (MESOS-7361) Command checks via agent pollute agent logs.
Date Wed, 08 Nov 2017 11:39:00 GMT

    [ https://issues.apache.org/jira/browse/MESOS-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16243758#comment-16243758
] 

Armand Grillet edited comment on MESOS-7361 at 11/8/17 11:38 AM:
-----------------------------------------------------------------

This issue is due to the fact that if the check is a COMMAND check, the checker will delegate
the execution of the check to the Mesos agent via the {{LaunchNestedContainerSession}} API
call: https://github.com/apache/mesos/blob/b13c4c3683fd6bad702a7fb9e24cfc3414b921da/src/checks/checker_process.cpp#L555?
If we use the universal containerizer, this means following that workflow: https://docs.google.com/document/d/1FtcyQkDfGp-bPHTW4pUoqQCgVlPde936bo-IIENO_ho/edit#


We do not differentiate regular nested containers and health checks (that are periodically
started nested containers), this results in 26 lines of logs for one command health check
on a Mesos agent due to its nested container state:

{code}
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.118629  4727 containerizer.cpp:1056] Trying
to chown '/var/lib/mesos/slave/slaves/d71d7fe2-348d-4091-9c26-445d103b3cf1-S1/frameworks/d71d7fe2-348d-4091-9c26-445d103b3cf1-0001/executors/instance-pod-with-healthcheck.8c197d66-b347-11e7-a146-42c7ec5fa9b9/runs/61a37663-ebbe-4e7e-8a0d-9dd7918b2348/containers/ba6aecb9-1b08-4356-af26-7a42a32043ff/containers/check-5517dae6-9d41-492e-b5ae-43bacc8692bf'
to user 'root'
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.118783  4727 containerizer.cpp:1085] Starting
container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.119021  4727 containerizer.cpp:2714] Transitioning
the state of container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
from PROVISIONING to PREPARING
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.144028  4721 switchboard.cpp:575] Created
I/O switchboard server (pid: 6669) listening on socket file '/tmp/mesos-io-switchboard-2ee398cd-f448-4bb9-9d80-e57069b9c199'
for container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.171833  4723 linux_launcher.cpp:429] Launching
nested container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
and cloning with namespaces
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.203270  4725 containerizer.cpp:2714] Transitioning
the state of container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
from PREPARING to ISOLATING
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.206295  4725 containerizer.cpp:2714] Transitioning
the state of container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
from ISOLATING to FETCHING
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.206835  4729 containerizer.cpp:2714] Transitioning
the state of container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
from FETCHING to RUNNING
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.241371  4728 http.cpp:2854] Received EOF
attach response for 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.242112  4725 containerizer.cpp:2168] Destroying
container
61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
in RUNNING state
Oct 30 14:43:15 mesos-agent[4718]: W1017 14:43:15.242159  4721 http.cpp:2869] Launch nested
container session connection for container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
closed
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.242184  4725 containerizer.cpp:2714] Transitioning
the state of container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
from RUNNING to DESTROYING
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.242352  4729 linux_launcher.cpp:505] Asked
to destroy container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.243599  4729 linux_launcher.cpp:548] Using
freezer to destroy cgroup mesos/61a37663-ebbe-4e7e-8a0d-9dd7918b2348/mesos/ba6aecb9-1b08-4356-af26-7a42a32043ff/mesos/check-5517dae6-9d41-492e-b5ae-43bacc8692bf
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.246537  4724 cgroups.cpp:3055] Freezing
cgroup /sys/fs/cgroup/freezer/mesos/61a37663-ebbe-4e7e-8a0d-9dd7918b2348/mesos/ba6aecb9-1b08-4356-af26-7a42a32043ff/mesos/check-5517dae6-9d41-492e-b5ae-43bacc8692bf
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.250340  4722 cgroups.cpp:1413] Successfully
froze cgroup /sys/fs/cgroup/freezer/mesos/61a37663-ebbe-4e7e-8a0d-9dd7918b2348/mesos/ba6aecb9-1b08-4356-af26-7a42a32043ff/mesos/check-5517dae6-9d41-492e-b5ae-43bacc8692bf
after 3.11808ms
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.253630  4729 cgroups.cpp:3073] Thawing cgroup
/sys/fs/cgroup/freezer/mesos/61a37663-ebbe-4e7e-8a0d-9dd7918b2348/mesos/ba6aecb9-1b08-4356-af26-7a42a32043ff/mesos/check-5517dae6-9d41-492e-b5ae-43bacc8692bf
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.256338  4722 cgroups.cpp:1442] Successfully
thawed cgroup /sys/fs/cgroup/freezer/mesos/61a37663-ebbe-4e7e-8a0d-9dd7918b2348/mesos/ba6aecb9-1b08-4356-af26-7a42a32043ff/mesos/check-5517dae6-9d41-492e-b5ae-43bacc8692bf
after 2.663936ms
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.336752  4729 switchboard.cpp:888] I/O switchboard
server process for container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
has terminated (status=0)
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.336876  4725 containerizer.cpp:2614] Container
61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
has exited
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.338035  4724 container_assigner.cpp:101]
Unregistering container_id[value: "check-5517dae6-9d41-492e-b5ae-43bacc8692bf" parent { value:
"ba6aecb9-1b08-4356-af26-7a42a32043ff" parent { value: "61a37663-ebbe-4e7e-8a0d-9dd7918b2348"
} }].
Oct 30 14:43:15 mesos-agent[4718]: W1017 14:43:15.338174  4843 container_assigner_strategy.cpp:192]
No ephemeral-port reader had been assigned to container[value: "check-5517dae6-9d41-492e-b5ae-43bacc8692bf"
parent { value: "ba6aecb9-1b08-4356-af26-7a42a32043ff" parent { value: "61a37663-ebbe-4e7e-8a0d-9dd7918b2348"
} }], cannot unregister
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.338228  4843 container_state_cache_impl.cpp:145]
Removing container file[/run/dcos/mesos/isolators/com_mesosphere_MetricsIsolatorModule/containers/check-5517dae6-9d41-492e-b5ae-43bacc8692bf]
Oct 30 14:43:15 mesos-agent[4718]: E1017 14:43:15.338282  4843 container_state_cache_impl.cpp:148]
Failed to remove container file[/run/dcos/mesos/isolators/com_mesosphere_MetricsIsolatorModule/containers/check-5517dae6-9d41-492e-b5ae-43bacc8692bf]:
No such file or directory
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.338367  4725 containerizer.cpp:2452] Checkpointing
termination state to nested container's runtime directory '/var/run/mesos/containers/61a37663-ebbe-4e7e-8a0d-9dd7918b2348/containers/ba6aecb9-1b08-4356-af26-7a42a32043ff/containers/check-5517dae6-9d41-492e-b5ae-43bacc8692bf/termination'
{code}

This is excessive as, for a cluster running 100 tasks using the universal containerizer each
having a COMMAND health check running every 30 seconds, this would mean around 5200 lines
of logs per minute.



was (Author: armandgrillet):
This issue is due to the fact that if the check is a COMMAND check, the checker will delegate
the execution of the check to the Mesos agent via the {{LaunchNestedContainerSession}} API
call: https://github.com/apache/mesos/blob/b13c4c3683fd6bad702a7fb9e24cfc3414b921da/src/checks/checker_process.cpp#L555?
If we use the universal containerizer, this means following that workflow: https://docs.google.com/document/d/1FtcyQkDfGp-bPHTW4pUoqQCgVlPde936bo-IIENO_ho/edit#


We do not differentiate regular nested containers and health checks (that are periodically
started nested containers), this results in 26 lines of logs for one command health check
on a Mesos agent due to its nested container state:

{code}
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.118629  4727 containerizer.cpp:1056] Trying
to chown '/var/lib/mesos/slave/slaves/d71d7fe2-348d-4091-9c26-445d103b3cf1-S1/frameworks/d71d7fe2-348d-4091-9c26-445d103b3cf1-0001/executors/instance-pod-with-healthcheck.8c197d66-b347-11e7-a146-42c7ec5fa9b9/runs/61a37663-ebbe-4e7e-8a0d-9dd7918b2348/containers/ba6aecb9-1b08-4356-af26-7a42a32043ff/containers/check-5517dae6-9d41-492e-b5ae-43bacc8692bf'
to user 'root'
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.118783  4727 containerizer.cpp:1085] Starting
container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.119021  4727 containerizer.cpp:2714] Transitioning
the state of container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
from PROVISIONING to PREPARING
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.144028  4721 switchboard.cpp:575] Created
I/O switchboard server (pid: 6669) listening on socket file '/tmp/mesos-io-switchboard-2ee398cd-f448-4bb9-9d80-e57069b9c199'
for container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.171833  4723 linux_launcher.cpp:429] Launching
nested container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
and cloning with namespaces
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.203270  4725 containerizer.cpp:2714] Transitioning
the state of container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
from PREPARING to ISOLATING
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.206295  4725 containerizer.cpp:2714] Transitioning
the state of container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
from ISOLATING to FETCHING
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.206835  4729 containerizer.cpp:2714] Transitioning
the state of container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
from FETCHING to RUNNING
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.241371  4728 http.cpp:2854] Received EOF
attach response for 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.242112  4725 containerizer.cpp:2168] Destroying
container
61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
in RUNNING state
Oct 30 14:43:15 mesos-agent[4718]: W1017 14:43:15.242159  4721 http.cpp:2869] Launch nested
container session connection for container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
closed
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.242184  4725 containerizer.cpp:2714] Transitioning
the state of container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
from RUNNING to DESTROYING
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.242352  4729 linux_launcher.cpp:505] Asked
to destroy container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.243599  4729 linux_launcher.cpp:548] Using
freezer to destroy cgroup mesos/61a37663-ebbe-4e7e-8a0d-9dd7918b2348/mesos/ba6aecb9-1b08-4356-af26-7a42a32043ff/mesos/check-5517dae6-9d41-492e-b5ae-43bacc8692bf
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.246537  4724 cgroups.cpp:3055] Freezing
cgroup /sys/fs/cgroup/freezer/mesos/61a37663-ebbe-4e7e-8a0d-9dd7918b2348/mesos/ba6aecb9-1b08-4356-af26-7a42a32043ff/mesos/check-5517dae6-9d41-492e-b5ae-43bacc8692bf
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.250340  4722 cgroups.cpp:1413] Successfully
froze cgroup /sys/fs/cgroup/freezer/mesos/61a37663-ebbe-4e7e-8a0d-9dd7918b2348/mesos/ba6aecb9-1b08-4356-af26-7a42a32043ff/mesos/check-5517dae6-9d41-492e-b5ae-43bacc8692bf
after 3.11808ms
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.253630  4729 cgroups.cpp:3073] Thawing cgroup
/sys/fs/cgroup/freezer/mesos/61a37663-ebbe-4e7e-8a0d-9dd7918b2348/mesos/ba6aecb9-1b08-4356-af26-7a42a32043ff/mesos/check-5517dae6-9d41-492e-b5ae-43bacc8692bf
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.256338  4722 cgroups.cpp:1442] Successfully
thawed cgroup /sys/fs/cgroup/freezer/mesos/61a37663-ebbe-4e7e-8a0d-9dd7918b2348/mesos/ba6aecb9-1b08-4356-af26-7a42a32043ff/mesos/check-5517dae6-9d41-492e-b5ae-43bacc8692bf
after 2.663936ms
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.336752  4729 switchboard.cpp:888] I/O switchboard
server process for container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
has terminated (status=0)
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.336876  4725 containerizer.cpp:2614] Container
61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
has exited
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.338035  4724 container_assigner.cpp:101]
Unregistering container_id[value: "check-5517dae6-9d41-492e-b5ae-43bacc8692bf" parent { value:
"ba6aecb9-1b08-4356-af26-7a42a32043ff" parent { value: "61a37663-ebbe-4e7e-8a0d-9dd7918b2348"
} }].
Oct 30 14:43:15 mesos-agent[4718]: W1017 14:43:15.338174  4843 container_assigner_strategy.cpp:192]
No ephemeral-port reader had been assigned to container[value: "check-5517dae6-9d41-492e-b5ae-43bacc8692bf"
parent { value: "ba6aecb9-1b08-4356-af26-7a42a32043ff" parent { value: "61a37663-ebbe-4e7e-8a0d-9dd7918b2348"
} }], cannot unregister
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.338228  4843 container_state_cache_impl.cpp:145]
Removing container file[/run/dcos/mesos/isolators/com_mesosphere_MetricsIsolatorModule/containers/check-5517dae6-9d41-492e-b5ae-43bacc8692bf]
Oct 30 14:43:15 mesos-agent[4718]: E1017 14:43:15.338282  4843 container_state_cache_impl.cpp:148]
Failed to remove container file[/run/dcos/mesos/isolators/com_mesosphere_MetricsIsolatorModule/containers/check-5517dae6-9d41-492e-b5ae-43bacc8692bf]:
No such file or directory
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.338367  4725 containerizer.cpp:2452] Checkpointing
termination state to nested container's runtime directory '/var/run/mesos/containers/61a37663-ebbe-4e7e-8a0d-9dd7918b2348/containers/ba6aecb9-1b08-4356-af26-7a42a32043ff/containers/check-5517dae6-9d41-492e-b5ae-43bacc8692bf/termination'
{code}

This is excessive as, for a cluster running 100 tasks each having a health check running every
30 seconds, this would mean around 5200 lines of logs per minute.


> Command checks via agent pollute agent logs.
> --------------------------------------------
>
>                 Key: MESOS-7361
>                 URL: https://issues.apache.org/jira/browse/MESOS-7361
>             Project: Mesos
>          Issue Type: Improvement
>          Components: agent
>            Reporter: Alexander Rukletsov
>            Assignee: Armand Grillet
>              Labels: check, health-check, mesosphere
>
> Command checks via agent leverage debug container API of the agent to start checks. Each
such invocation triggers a bunch of logs on the agent, because the API was not originally
designed with periodic invocations in mind. We should find a way to avoid excessive logging
on the agent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message