mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Taylor <...@jaytaylor.com>
Subject Re: Can health-checks be run by Mesos for docker tasks?
Date Mon, 12 Oct 2015 22:26:38 GMT
Hi Marco,

What a relief!

I'd love to file the JIRA ticket for this, but I don't think my account has
permissions over on https://issues.apache.org/jira/browse/MESOS.  I
am "jaytaylor" over there.  Please let me know if you can help with that
and we can get the ball rolling on this.


On Mon, Oct 12, 2015 at 3:14 PM, Marco Massenzio <marco@mesosphere.io>
wrote:

> Jay:
>
> you hit the nail on the head: the direction is definitely one-way (from
> MESOS_ENV var to Flag) and we don't reflect --flag back into the MESOS_FLAG
> env var.
> Others more familiar with the matter may correct me, but it looks like you
> have uncovered a bug in the executor code: could you please file a Jira for
> us to look into?
>
> It seems to me that, at present, the only workaround is for you would be
> to set the MESOS_LAUNCHER_DIR env var, as the flag won't be picked by the
> executor.
>
>
> --
> *Marco Massenzio*
> Distributed Systems Engineer
> http://codetrips.com
>
> On Mon, Oct 12, 2015 at 11:44 PM, Jay Taylor <jay@jaytaylor.com> wrote:
>
>> Hi Marco,
>>
>> My reply is inline below-
>>
>> On Mon, Oct 12, 2015 at 2:27 PM, Marco Massenzio <marco@mesosphere.io>
>> wrote:
>>
>>>
>>> On Mon, Oct 12, 2015 at 11:26 PM, Marco Massenzio <marco@mesosphere.io>
>>> wrote:
>>>
>>>> Are those the stdout logs of the Agent? Because I don't see the
>>>> --launcher-dir set, however, if I look into one that is running off the
>>>> same 0.24.1 package, this is what I see:
>>>>
>>>> I1012 14:56:36.933856  1704 slave.cpp:191] Flags at startup:
>>>> --appc_store_dir="/tmp/mesos/store/appc"
>>>> --attributes="rack:r2d2;pod:demo,dev" --authenticatee="crammd5"
>>>> --cgroups_cpu_enable_pids_and_tids_count="false"
>>>> --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
>>>> --cgroups_limit_swap="false" --cgroups_root="mesos"
>>>> --container_disk_watch_interval="15secs" --containerizers="docker,mesos"
>>>> --default_role="*" --disk_watch_interval="1mins" --docker="docker"
>>>> --docker_kill_orphans="true" --docker_remove_delay="6hrs"
>>>> --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns"
>>>> --enforce_container_disk_quota="false"
>>>> --executor_registration_timeout="1mins"
>>>> --executor_shutdown_grace_period="5secs"
>>>> --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB"
>>>> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1"
>>>> --hadoop_home="" --help="false" --initialize_driver_logging="true"
>>>> --ip="192.168.33.11" --isolation="cgroups/cpu,cgroups/mem"
>>>> --launcher_dir="/usr/libexec/mesos"
>>>> --log_dir="/var/local/mesos/logs/agent" --logbufsecs="0"
>>>> --logging_level="INFO" --master="zk://192.168.33.1:2181/mesos/vagrant"
>>>> --oversubscribed_resources_interval="15secs" --perf_duration="10secs"
>>>> --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns"
>>>> --quiet="false" --recover="reconnect" --recovery_timeout="15mins"
>>>> --registration_backoff_factor="1secs"
>>>> --resource_monitoring_interval="1secs"
>>>> --resources="ports:[9000-10000];ephemeral_ports:[32768-57344]"
>>>> --revocable_cpu_low_priority="true"
>>>> --sandbox_directory="/var/local/sandbox" --strict="true"
>>>> --switch_user="true" --version="false" --work_dir="/var/local/mesos/agent"
>>>>
>>> (this is run off the Vagrantfile at [0] in case you want to reproduce).
>>>> That agent is not run via the init command, though, I execute it
>>>> manually via the `run-agent.sh` in the same directory.
>>>>
>>>> I don't really think this matters, but I assume you also restarted the
>>>> agent after making the config changes?
>>>> (and, for your own sanity - you can double check the version by looking
>>>> at the very head of the logs).
>>>>
>>>
>> Yes I definitely restarted all mesos processes after config changes :)
>>
>> Here s info equivalent to what you posted from one of the slaves INFO log:
>>
>> Log file created at: 2015/10/12 20:22:58
>>> Running on machine: mesos-worker2a
>>> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
>>> I1012 20:22:58.469779  9605 logging.cpp:172] INFO level logging started!
>>> I1012 20:22:58.470006  9605 main.cpp:185] Build: 2015-09-25 19:13:24 by
>>> root
>>> I1012 20:22:58.470023  9605 main.cpp:187] Version: 0.24.1
>>> I1012 20:22:58.470031  9605 main.cpp:190] Git tag: 0.24.1
>>> I1012 20:22:58.470039  9605 main.cpp:194] Git SHA:
>>> 44873806c2bb55da37e9adbece938274d8cd7c48
>>> I1012 20:22:58.470221  9605 containerizer.cpp:143] Using isolation:
>>> posix/cpu,posix/mem,filesystem/posix
>>> I1012 20:22:58.573750  9605 main.cpp:272] Starting Mesos slave
>>> I1012 20:22:58.574662  9621 slave.cpp:190] Slave started on 1)@
>>> 192.168.225.59:5050
>>> I1012 20:22:58.574695  9621 slave.cpp:191] Flags at startup:
>>> --appc_store_dir="/tmp/mesos/store/appc" --authenticatee="crammd5"
>>> --cgroups_cpu_enable_pids_and_tids_count="false"
>>> --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
>>> --cgroups_limit_swap="false" --cgroups_root="mesos"
>>> --container_disk_watch_interval="15secs" --containerizers="mesos,docker"
>>> --default_role="*" --disk_watch_interval="1mins" --docker="docker"
>>> --docker_kill_orphans="true" --docker_remove_delay="6hrs"
>>> --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns"
>>> --enforce_container_disk_quota="false"
>>> --executor_registration_timeout="5mins"
>>> --executor_shutdown_grace_period="5secs"
>>> --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB"
>>> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1"
>>> --hadoop_home="" --help="false" --hostname="
>>> mesos-worker2a-hobart.gigawatt.io" --initialize_driver_logging="true"
>>> --ip="192.168.225.59" --isolation="posix/cpu,posix/mem" --
>>> *launcher_dir="/usr/libexec/mesos"* --log_dir="/var/log/mesos"
>>> --logbufsecs="0" --logging_level="INFO"
>>> --master="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
>>> --oversubscribed_resources_interval="15secs" --perf_duration="10secs"
>>> --perf_interval="1mins" --port="5050" --qos_correction_interval_min="0ns"
>>> --quiet="false" --recover="reconnect" --recovery_timeout="15mins"
>>> --registration_backoff_factor="1secs"
>>> --resource_monitoring_interval="1secs" --revocable_cpu_low_priority="true"
>>> --sandbox_directory="/mnt/mesos/sandbox" --strict="true"
>>> --switch_user="true" --version="false" --work_dir="/tmp/mesos"
>>
>>
>> The launcher dir is picked up by the mesos-slave process.  We can also
>> see the cmdline flag is picked up from /etc/mesos-slave like this:
>>
>> mesos-worker2a:~$ ps -ef | grep mesos
>>> root      9605     1  1 20:22 ?        00:01:18 /usr/sbin/mesos-slave
>>> --ip=192.168.225.59 --log_dir=/var/log/mesos --
>>> *launcher_dir=/usr/libexec/mesos*root      9612  9605  0 20:22 ?
>>>  00:00:00 logger -p user.info -t mesos-slave[9605]
>>> root      9613  9605  0 20:22 ?        00:00:00 logger -p user.err -t
>>> mesos-slave[9605]
>>> vagrant   9951  6010  0 21:36 pts/0    00:00:00 grep --color=auto mesos
>>
>>
>>
>> What I keep coming back to is the fact that the MESOS_LAUNCHER_DIR env
>> var does not seem get picked up here:
>> https://github.com/apache/mesos/blob/master/src/docker/executor.cpp#L573-L576
>> :
>>
>>   const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
>>>   string path =
>>>     envPath.isSome() ? envPath.get()
>>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>
>>
>> And argv[0] (which contains the slave work dir) is the path we see in the
>> tasks stdout.
>>
>> I'm still having trouble understanding how flags defined in
>> mesos::internal::slave::Flags::Flags (src/slave/flags.[ch]pp) are
>> propagated or expanded to MESOS_<flag_in_caps> environment variables.  Can
>> you confirm if such a mechanism exists and if so where it is?
>>
>> Otherwise, if my understanding is correct and such a mechanism doesn't
>> exist:
>>
>> How can the requisite MESOS_LAUNHER_DIR env var be available when
>> docker/executor.cpp (a child process of mesos-slave) attempts to read it?
>>
>> The lack of such a mechanism would explain the behavior I'm currently
>> observing.
>>
>> Thanks!
>> Jay
>>
>>
>>>>
>>>> [0] http://github.com/massenz/zk-mesos
>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> *Marco Massenzio*
>>>> Distributed Systems Engineer
>>>> http://codetrips.com
>>>>
>>>> On Mon, Oct 12, 2015 at 10:50 PM, Jay Taylor <outtatime@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Haosdent and Mesos friends,
>>>>>
>>>>> I've rebuilt the cluster from scratch and installed mesos 0.24.1 from
>>>>> the mesosphere apt repo:
>>>>>
>>>>> $ dpkg -l | grep mesos
>>>>> ii  mesos                               0.24.1-0.2.35.ubuntu1404
>>>>>      amd64        Cluster resource manager with efficient resource isolation
>>>>>
>>>>> Then added the `launcher_dir' flag to /etc/mesos-slave/launcher_dir on
>>>>> the slaves:
>>>>>
>>>>> mesos-worker1a:~$ cat /etc/mesos-slave/launcher_dir
>>>>> /usr/libexec/mesos
>>>>>
>>>>> And yet the task health-checks are still being launched from the
>>>>> sandbox directory like before!
>>>>>
>>>>> I've also tested setting the MESOS_LAUNCHER_DIR env var and get the
>>>>> identical result (just as before on the cluster where many versions of
>>>>> mesos had been installed):
>>>>>
>>>>> STDOUT:
>>>>>
>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>> --stop_timeout="0ns"
>>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>> --stop_timeout="0ns"
>>>>>> Registered docker executor on mesos-worker1a
>>>>>> Starting task hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>>> Launching health check process:
>>>>>> /tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb/mesos-health-check
>>>>>> --executor=(1)@192.168.225.58:48912
>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>> mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb
>>>>>> sh -c \" curl --silent --show-error --fail --tcp-nodelay --head -X GET
>>>>>> --user-agent flux-capacitor-health-checker --max-time 1 http:\/\/
>>>>>> 127.0.0.1:8000
>>>>>> \""},"consecutive_failures":6,"delay_seconds":15,"grace_period_seconds":10,"interval_seconds":1,"timeout_seconds":1}
>>>>>> --task_id=hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>>> Health check process launched at pid: 11253
>>>>>
>>>>>
>>>>>
>>>>> STDERR:
>>>>>
>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>> --stop_timeout="0ns"
>>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>> --stop_timeout="0ns"
>>>>>> Registered docker executor on mesos-worker1a
>>>>>> Starting task hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>>> *Launching health check process:
>>>>>> /tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb/mesos-health-check*
>>>>>> --executor=(1)@192.168.225.58:48912
>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>> mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb
>>>>>> sh -c \" curl --silent --show-error --fail --tcp-nodelay --head -X GET
>>>>>> --user-agent flux-capacitor-health-checker --max-time 1 http:\/\/
>>>>>> 127.0.0.1:8000
>>>>>> \""},"consecutive_failures":6,"delay_seconds":15,"grace_period_seconds":10,"interval_seconds":1,"timeout_seconds":1}
>>>>>> --task_id=hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>>> Health check process launched at pid: 11253
>>>>>
>>>>>
>>>>> Any ideas on where to go from here?  Is there any additional
>>>>> information I can provide?
>>>>>
>>>>> Thanks as always,
>>>>> Jay
>>>>>
>>>>>
>>>>> On Thu, Oct 8, 2015 at 9:23 PM, haosdent <haosdent@gmail.com> wrote:
>>>>>
>>>>>> For flag sent to the executor from containerizer, the flag would
>>>>>> stringify and become a command line parameter when launch executor.
>>>>>>
>>>>>> You could see this in
>>>>>> https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L279-L288
>>>>>>
>>>>>> But for launcher_dir, the executor get it from `argv[0]`, as you
>>>>>> mentioned above.
>>>>>> ```
>>>>>>   string path =
>>>>>>     envPath.isSome() ? envPath.get()
>>>>>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>>>>>
>>>>>> ```
>>>>>> So I want to figure out why your argv[0] would become sandbox dir,
>>>>>> not "/usr/libexec/mesos".
>>>>>>
>>>>>> On Fri, Oct 9, 2015 at 12:03 PM, Jay Taylor <outtatime@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I see.  And then how are the flags sent to the executor?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Oct 8, 2015, at 8:56 PM, haosdent <haosdent@gmail.com> wrote:
>>>>>>>
>>>>>>> Yes. The related code is located in
>>>>>>> https://github.com/apache/mesos/blob/master/src/slave/main.cpp#L123
>>>>>>>
>>>>>>> In fact, environment variables starts with MESOS_ would load as
>>>>>>> flags variables.
>>>>>>>
>>>>>>> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/flags/flags.hpp#L52
>>>>>>>
>>>>>>> On Fri, Oct 9, 2015 at 11:33 AM, Jay Taylor <outtatime@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> One question for you haosdent-
>>>>>>>>
>>>>>>>> You mentioned that the flags.launcher_dir should propagate to the
>>>>>>>> docker executor all the way up the chain.  Can you show me where this logic
>>>>>>>> is in the codebase?  I didn't see where that was happening and would like
>>>>>>>> to understand the mechanism.
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>> Jay
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Oct 8, 2015, at 8:29 PM, Jay Taylor <outtatime@gmail.com> wrote:
>>>>>>>>
>>>>>>>> Maybe tomorrow I will build a fresh cluster from scratch to see if
>>>>>>>> the broken behavior experienced today still persists.
>>>>>>>>
>>>>>>>> On Oct 8, 2015, at 7:52 PM, haosdent <haosdent@gmail.com> wrote:
>>>>>>>>
>>>>>>>> As far as I know, MESOS_LAUNCHER_DIR is works by set
>>>>>>>> flags.launcher_dir which would find mesos-docker-executor
>>>>>>>> and mesos-health-check under this dir. Although the env is not propagated,
>>>>>>>> but MESOS_LAUNCHER_DIR still works because flags.launcher_dir is
>>>>>>>> get from it.
>>>>>>>>
>>>>>>>> For example, because I
>>>>>>>> ```
>>>>>>>> export MESOS_LAUNCHER_DIR=/tmp
>>>>>>>> ```
>>>>>>>> before start mesos-slave. So when I launch slave, I could find this
>>>>>>>> log in slave log
>>>>>>>> ```
>>>>>>>> I1009 10:27:26.594599  1416 slave.cpp:203] Flags at startup:
>>>>>>>> xxxxx  --launcher_dir="/tmp"
>>>>>>>> ```
>>>>>>>>
>>>>>>>> And from your log, I not sure why your MESOS_LAUNCHER_DIR become
>>>>>>>> sandbox dir. Is it because MESOS_LAUNCHER_DIR is overrided in your other
>>>>>>>> scripts?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Oct 9, 2015 at 1:56 AM, Jay Taylor <outtatime@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I haven't ever changed MESOS_LAUNCHER_DIR/--launcher_dir before.
>>>>>>>>>
>>>>>>>>> I just tried setting both the env var and flag on the slaves, and
>>>>>>>>> have determined that the env var is not present when it is being checked
>>>>>>>>> src/docker/executor.cpp @ line 573:
>>>>>>>>>
>>>>>>>>>  const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
>>>>>>>>>>   string path =
>>>>>>>>>>     envPath.isSome() ? envPath.get()
>>>>>>>>>>                      :
>>>>>>>>>> os::realpath(Path(argv[0]).dirname()).get();
>>>>>>>>>>   cout << "MESOS_LAUNCHER_DIR: envpath.isSome()->" <<
>>>>>>>>>> (envPath.isSome() ? "yes" : "no") << endl;
>>>>>>>>>>   cout << "MESOS_LAUNCHER_DIR: path='" << path << "'" << endl;
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Exported MESOS_LAUNCHER_DIR env var (and verified it is correctly
>>>>>>>>> propagated along up to the point of mesos-slave launch):
>>>>>>>>>
>>>>>>>>> $ cat /etc/default/mesos-slave
>>>>>>>>>> export
>>>>>>>>>> MESOS_MASTER="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
>>>>>>>>>> export MESOS_CONTAINERIZERS="mesos,docker"
>>>>>>>>>> export MESOS_EXECUTOR_REGISTRATION_TIMEOUT="5mins"
>>>>>>>>>> export MESOS_PORT="5050"
>>>>>>>>>> export MESOS_LAUNCHER_DIR="/usr/libexec/mesos"
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> TASK OUTPUT:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> *MESOS_LAUNCHER_DIR: envpath.isSome()->no**MESOS_LAUNCHER_DIR:
>>>>>>>>>> path='/tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad'*
>>>>>>>>>> Registered docker executor on mesos-worker2a
>>>>>>>>>> Starting task
>>>>>>>>>> hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>>>>>>>> Launching health check process:
>>>>>>>>>> /tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad/mesos-health-check
>>>>>>>>>> --executor=(1)@192.168.225.59:44523
>>>>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>>>>> mesos-61373c0e-7349-4173-ab8d-9d7b260e8a30-S1.41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad
>>>>>>>>>> sh -c \" \/bin\/bash
>>>>>>>>>> \""},"consecutive_failures":3,"delay_seconds":5.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>>>>>>>> --task_id=hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>>>>>>>> Health check process launched at pid: 2519
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The env var is not propagated when the docker executor is launched
>>>>>>>>> in src/slave/containerizer/docker.cpp around line 903:
>>>>>>>>>
>>>>>>>>>   vector<string> argv;
>>>>>>>>>>   argv.push_back("mesos-docker-executor");
>>>>>>>>>>   // Construct the mesos-docker-executor using the "name" we gave
>>>>>>>>>> the
>>>>>>>>>>   // container (to distinguish it from Docker containers not
>>>>>>>>>> created
>>>>>>>>>>   // by Mesos).
>>>>>>>>>>   Try<Subprocess> s = subprocess(
>>>>>>>>>>       path::join(flags.launcher_dir, "mesos-docker-executor"),
>>>>>>>>>>       argv,
>>>>>>>>>>       Subprocess::PIPE(),
>>>>>>>>>>       Subprocess::PATH(path::join(container->directory,
>>>>>>>>>> "stdout")),
>>>>>>>>>>       Subprocess::PATH(path::join(container->directory,
>>>>>>>>>> "stderr")),
>>>>>>>>>>       dockerFlags(flags, container->name(), container->directory),
>>>>>>>>>>       environment,
>>>>>>>>>>       lambda::bind(&setup, container->directory));
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> A little ways above we can see the environment is setup w/ the
>>>>>>>>> container tasks defined env vars.
>>>>>>>>>
>>>>>>>>> See src/slave/containerizer/docker.cpp around line 871:
>>>>>>>>>
>>>>>>>>>   // Include any enviroment variables from ExecutorInfo.
>>>>>>>>>>   foreach (const Environment::Variable& variable,
>>>>>>>>>>
>>>>>>>>>>  container->executor.command().environment().variables()) {
>>>>>>>>>>     environment[variable.name()] = variable.value();
>>>>>>>>>>   }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Should I file a JIRA for this?  Have I overlooked anything?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Oct 7, 2015 at 8:11 PM, haosdent <haosdent@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> >Not sure what was going on with health-checks in 0.24.0.
>>>>>>>>>> 0.24.1 should be works.
>>>>>>>>>>
>>>>>>>>>> >Do any of you know which host the path
>>>>>>>>>> "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>>>>>>>>> should exist on? It definitely doesn't exist on the slave, hence execution
>>>>>>>>>> failing.
>>>>>>>>>>
>>>>>>>>>> Does you set MESOS_LAUNCHER_DIR/--launcher_dir incorrectly
>>>>>>>>>> before? We got mesos-health-check from MESOS_LAUNCHER_DIR/--launcher_id or
>>>>>>>>>> use the same dir of mesos-docker-executor.
>>>>>>>>>>
>>>>>>>>>> On Thu, Oct 8, 2015 at 10:46 AM, Jay Taylor <outtatime@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Maybe I spoke too soon.
>>>>>>>>>>>
>>>>>>>>>>> Now the checks are attempting to run, however the STDERR is not
>>>>>>>>>>> looking good.  I've added some debugging to the error message output to
>>>>>>>>>>> show the path, argv, and envp variables:
>>>>>>>>>>>
>>>>>>>>>>> STDOUT:
>>>>>>>>>>>
>>>>>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>>>>>>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>>>>>>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>> Registered docker executor on mesos-worker2a
>>>>>>>>>>>> Starting task
>>>>>>>>>>>> app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>>>>>>>> Launching health check process:
>>>>>>>>>>>> /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check
>>>>>>>>>>>> --executor=(1)@192.168.225.59:43917
>>>>>>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>>>>>>> mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc
>>>>>>>>>>>> sh -c \" exit 1
>>>>>>>>>>>> \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>>>>>>>>>> --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>>>>>>>> Health check process launched at pid: 3012
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> STDERR:
>>>>>>>>>>>
>>>>>>>>>>> I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0
>>>>>>>>>>>> I1008 02:17:28.871860 2778 exec.cpp:208] Executor registered on
>>>>>>>>>>>> slave 16b49e90-6852-4c91-8e70-d89c54f25668-S1
>>>>>>>>>>>> WARNING: Your kernel does not support swap limit capabilities,
>>>>>>>>>>>> memory limited without swap.
>>>>>>>>>>>> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in
>>>>>>>>>>>> childMain
>>>>>>>>>>>> (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>>>>>>>>> argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>>>>>>>>> envp=''): No such file or directory*** Aborted at 1444270649 (unix time)
>>>>>>>>>>>> try "date -d @1444270649" if you are using GNU date ***
>>>>>>>>>>>> PC: @ 0x7f4a37ec6cc9 (unknown)
>>>>>>>>>>>> *** SIGABRT (@0xbc4) received by PID 3012 (TID 0x7f4a2f9f6700)
>>>>>>>>>>>> from PID 3012; stack trace: ***
>>>>>>>>>>>> @ 0x7f4a38265340 (unknown)
>>>>>>>>>>>> @ 0x7f4a37ec6cc9 (unknown)
>>>>>>>>>>>> @ 0x7f4a37eca0d8 (unknown)
>>>>>>>>>>>> @ 0x4191e2 _Abort()
>>>>>>>>>>>> @ 0x41921c _Abort()
>>>>>>>>>>>> @ 0x7f4a39dc2768 process::childMain()
>>>>>>>>>>>> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke()
>>>>>>>>>>>> @ 0x7f4a39dc24fc process::defaultClone()
>>>>>>>>>>>> @ 0x7f4a39dc34fb process::subprocess()
>>>>>>>>>>>> @ 0x43cc9c
>>>>>>>>>>>> mesos::internal::docker::DockerExecutorProcess::launchHealthCheck()
>>>>>>>>>>>> @ 0x7f4a39d924f4 process::ProcessManager::resume()
>>>>>>>>>>>> @ 0x7f4a39d92827
>>>>>>>>>>>> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
>>>>>>>>>>>> @ 0x7f4a38a47e40 (unknown)
>>>>>>>>>>>> @ 0x7f4a3825d182 start_thread
>>>>>>>>>>>> @ 0x7f4a37f8a47d (unknown)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Do any of you know which host the path "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>>>>>>>>>> should exist on? It definitely doesn't exist on the slave,
>>>>>>>>>>> hence execution failing.
>>>>>>>>>>>
>>>>>>>>>>> This is with current master, git hash
>>>>>>>>>>> 5058fac1083dc91bca54d33c26c810c17ad95dd1.
>>>>>>>>>>>
>>>>>>>>>>> commit 5058fac1083dc91bca54d33c26c810c17ad95dd1
>>>>>>>>>>>> Author: Anand Mazumdar <mazumdar.anand@gmail.com>
>>>>>>>>>>>> Date:   Tue Oct 6 17:37:41 2015 -0700
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -Jay
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <outtatime@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Update:
>>>>>>>>>>>>
>>>>>>>>>>>> I used https://github.com/deric/mesos-deb-packaging to compile
>>>>>>>>>>>> and package the latest master (0.26.x) and deployed it to the cluster, and
>>>>>>>>>>>> now health checks are working as advertised in both Marathon and my own
>>>>>>>>>>>> framework!  Not sure what was going on with health-checks in 0.24.0..
>>>>>>>>>>>>
>>>>>>>>>>>> Anyways, thanks again for your help Haosdent!
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Jay
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <
>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Haosdent,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can you share your Marathon POST request that results in Mesos
>>>>>>>>>>>>> executing the health checks?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Since we can reference the Marathon framework, I've been doing
>>>>>>>>>>>>> some digging around.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here are the details of my setup and findings:
>>>>>>>>>>>>>
>>>>>>>>>>>>> I put a few small hacks in Marathon:
>>>>>>>>>>>>>
>>>>>>>>>>>>> (1) Added com.googlecode.protobuf.format to Marathon's
>>>>>>>>>>>>> dependencies
>>>>>>>>>>>>>
>>>>>>>>>>>>> (2) Edited the following files so TaskInfo is dumped as JSON
>>>>>>>>>>>>> to /tmp/X in both the TaskFactory as well an right before the task is sent
>>>>>>>>>>>>> to Mesos via driver.launchTasks:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala:
>>>>>>>>>>>>>
>>>>>>>>>>>>> $ git diff
>>>>>>>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala
>>>>>>>>>>>>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() (
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>      new TaskBuilder(app, taskIdUtil.newTaskId,
>>>>>>>>>>>>>> config).buildIfMatches(offer, runningTasks).map {
>>>>>>>>>>>>>>        case (taskInfo, ports) =>
>>>>>>>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>>>>>>>> +        import java.io._
>>>>>>>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(new
>>>>>>>>>>>>>> File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue)))
>>>>>>>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfo))
>>>>>>>>>>>>>> +        bw.write("\n")
>>>>>>>>>>>>>> +        bw.close()
>>>>>>>>>>>>>>          CreatedTask(
>>>>>>>>>>>>>>            taskInfo,
>>>>>>>>>>>>>>            MarathonTasks.makeTask(
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala:
>>>>>>>>>>>>>
>>>>>>>>>>>>> $ git diff
>>>>>>>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala
>>>>>>>>>>>>>> @@ -24,6 +24,16 @@ private[launcher] class TaskLauncherImpl(
>>>>>>>>>>>>>>    override def launchTasks(offerID: OfferID, taskInfos:
>>>>>>>>>>>>>> Seq[TaskInfo]): Boolean = {
>>>>>>>>>>>>>>      val launched = withDriver(s"launchTasks($offerID)") {
>>>>>>>>>>>>>> driver =>
>>>>>>>>>>>>>>        import scala.collection.JavaConverters._
>>>>>>>>>>>>>> +      var i = 0
>>>>>>>>>>>>>> +      for (i <- 0 to taskInfos.length - 1) {
>>>>>>>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>>>>>>>> +        import java.io._
>>>>>>>>>>>>>> +        val file = new File("/tmp/taskJson2-" + i.toString()
>>>>>>>>>>>>>> + "-" + taskInfos(i).getTaskId.getValue)
>>>>>>>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(file))
>>>>>>>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfos(i)))
>>>>>>>>>>>>>> +        bw.write("\n")
>>>>>>>>>>>>>> +        bw.close()
>>>>>>>>>>>>>> +      }
>>>>>>>>>>>>>>        driver.launchTasks(Collections.singleton(offerID),
>>>>>>>>>>>>>> taskInfos.asJava)
>>>>>>>>>>>>>>      }
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Then I built and deployed the hacked Marathon and restarted
>>>>>>>>>>>>> the marathon service.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Next I created the app via the Marathon API ("hello app" is a
>>>>>>>>>>>>> container with a simple hello-world ruby app running on
>>>>>>>>>>>>> 0.0.0.0:8000)
>>>>>>>>>>>>>
>>>>>>>>>>>>> curl http://mesos-primary1a:8080/v2/groups -XPOST
>>>>>>>>>>>>>> -H'Content-Type: application/json' -d'
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>   "id": "/app-81-1-hello-app",
>>>>>>>>>>>>>>   "apps": [
>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>       "id": "/app-81-1-hello-app/web-v11",
>>>>>>>>>>>>>>       "container": {
>>>>>>>>>>>>>>         "type": "DOCKER",
>>>>>>>>>>>>>>         "docker": {
>>>>>>>>>>>>>>           "image":
>>>>>>>>>>>>>> "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>>>>>>>           "network": "BRIDGE",
>>>>>>>>>>>>>>           "portMappings": [
>>>>>>>>>>>>>>             {
>>>>>>>>>>>>>>               "containerPort": 8000,
>>>>>>>>>>>>>>               "hostPort": 0,
>>>>>>>>>>>>>>               "protocol": "tcp"
>>>>>>>>>>>>>>             }
>>>>>>>>>>>>>>           ]
>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>       "env": {
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>       "healthChecks": [
>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>           "protocol": "COMMAND",
>>>>>>>>>>>>>>           "command": {"value": "exit 1"},
>>>>>>>>>>>>>>           "gracePeriodSeconds": 10,
>>>>>>>>>>>>>>           "intervalSeconds": 10,
>>>>>>>>>>>>>>           "timeoutSeconds": 10,
>>>>>>>>>>>>>>           "maxConsecutiveFailures": 3
>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>       "instances": 1,
>>>>>>>>>>>>>>       "cpus": 1,
>>>>>>>>>>>>>>       "mem": 512
>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>   ]
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> $ ls /tmp/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Do they match?
>>>>>>>>>>>>>
>>>>>>>>>>>>> $ md5sum /tmp/task*
>>>>>>>>>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>>>>>>>>>  /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>>>>>>>>>  /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, so I am confident this is the information being sent
>>>>>>>>>>>>> across the wire to Mesos.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Do they contain any health-check information?
>>>>>>>>>>>>>
>>>>>>>>>>>>> $ cat
>>>>>>>>>>>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>   "name":"web-v11.app-81-1-hello-app",
>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>       "type":"SCALAR",
>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>         "value":1.0
>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>       "type":"SCALAR",
>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>         "value":512.0
>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>       "type":"RANGES",
>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>             "begin":31641,
>>>>>>>>>>>>>>             "end":31641
>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>     "environment":{
>>>>>>>>>>>>>>       "variables":[
>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>           "name":"PORT_8000",
>>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>           "name":"MARATHON_APP_VERSION",
>>>>>>>>>>>>>>           "value":"2015-10-07T19:35:08.386Z"
>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>           "name":"HOST",
>>>>>>>>>>>>>>           "value":"mesos-worker1a"
>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>           "name":"MARATHON_APP_DOCKER_IMAGE",
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966"
>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>           "name":"MESOS_TASK_ID",
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>           "name":"PORT",
>>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>           "name":"PORTS",
>>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>           "name":"MARATHON_APP_ID",
>>>>>>>>>>>>>>           "value":"/app-81-1-hello-app/web-v11"
>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>           "name":"PORT0",
>>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>       ]
>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>     "type":"DOCKER",
>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>>>>>>>       "network":"BRIDGE",
>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>           "host_port":31641,
>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> No, I don't see anything about any health check.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Mesos STDOUT for the launched task:
>>>>>>>>>>>>>
>>>>>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>>>>>>>>> Starting task
>>>>>>>>>>>>>> app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> And STDERR:
>>>>>>>>>>>>>
>>>>>>>>>>>>> I1007 19:35:08.790743  4612 exec.cpp:134] Version: 0.24.0
>>>>>>>>>>>>>> I1007 19:35:08.793416  4619 exec.cpp:208] Executor registered
>>>>>>>>>>>>>> on slave 20150924-210922-1608624320-5050-1792-S1
>>>>>>>>>>>>>> WARNING: Your kernel does not support swap limit
>>>>>>>>>>>>>> capabilities, memory limited without swap.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Again, nothing about any health checks.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Any ideas of other things to try or what I could be missing?
>>>>>>>>>>>>> Can't say either way about the Mesos health-check system working or not if
>>>>>>>>>>>>> Marathon won't put the health-check into the task it sends to Mesos.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for all your help!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <haosdent@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Maybe you could post your executor stdout/stderr so that we
>>>>>>>>>>>>>> could know whether health check running not.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <haosdent@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> marathon also use mesos health check. When I use health
>>>>>>>>>>>>>>> check, I could saw the log like this in executor stdout.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>> Registered docker executor on xxxxx
>>>>>>>>>>>>>>> Starting task
>>>>>>>>>>>>>>> test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000
>>>>>>>>>>>>>>> Launching health check process:
>>>>>>>>>>>>>>> /home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx
>>>>>>>>>>>>>>> Health check process launched at pid: 9895
>>>>>>>>>>>>>>> Received task health update, healthy: true
>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <
>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I am using my own framework, and the full task info I'm
>>>>>>>>>>>>>>>> using is posted earlier in this thread.  Do you happen to know if Marathon
>>>>>>>>>>>>>>>> uses Mesos's health checks for its health check system?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <haosdent@gmail.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yes, launch the health task through its definition in
>>>>>>>>>>>>>>>> taskinfo. Do you launch your task through Marathon? I could test it in my
>>>>>>>>>>>>>>>> side.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <
>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Precisely, and there are none of those statements.  Are
>>>>>>>>>>>>>>>>> you or others confident health-checks are part of the code path when
>>>>>>>>>>>>>>>>> defined via task info for docker container tasks?  Going through the code,
>>>>>>>>>>>>>>>>> I wasn't able to find the linkage for anything other than health-checks
>>>>>>>>>>>>>>>>> triggered through a custom executor.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> With that being said it is a pretty good sized code base
>>>>>>>>>>>>>>>>> and I'm not very familiar with it, so my analysis this far has by no means
>>>>>>>>>>>>>>>>> been exhaustive.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <haosdent@gmail.com>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> When health check launch, it would have a log like this in
>>>>>>>>>>>>>>>>> your executor stdout
>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>> Health check process launched at pid xxx
>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <
>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I'm happy to try this, however wouldn't there be output
>>>>>>>>>>>>>>>>>> in the logs with the string "health" or "Health" if the health-check were
>>>>>>>>>>>>>>>>>> active?  None of my master or slave logs contain the string..
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <haosdent@gmail.com>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Could you use "exit 1" instead of "sleep 5" to see
>>>>>>>>>>>>>>>>>> whether could see unhealthy status in your task stdout/stderr.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <
>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> My current version is 0.24.1.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <
>>>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1
>>>>>>>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>>>>>>>>>>>>>>>>> Are you use one of this version?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <
>>>>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport,
>>>>>>>>>>>>>>>>>>>>> let me double check.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Oops- Now I see you already said it's in master.
>>>>>>>>>>>>>>>>>>>>>> I'll look there :)
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks again!
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>>> jay@jaytaylor.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Do you know if there is a branch I can checkout to
>>>>>>>>>>>>>>>>>>>>>>> test it out?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <
>>>>>>>>>>>>>>>>>>>>>>> tim@mesosphere.io> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Hi Jay,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> We just added health check support for docker tasks
>>>>>>>>>>>>>>>>>>>>>>>> that's in master but not yet released. It will run docker exec with the
>>>>>>>>>>>>>>>>>>>>>>>> command you provided as health checks.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> It should be in the next release.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Tim
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Does Mesos support health checks for docker image
>>>>>>>>>>>>>>>>>>>>>>>> tasks?  Mesos seems to be ignoring the TaskInfo.HealthCheck field for me.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>         "value":256
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       "type":1,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>             "end":31002
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     "container":{
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     "type":1,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       "network":2,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     "command":{
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I have searched all machines and containers to see
>>>>>>>>>>>>>>>>>>>>>>>> if they ever run the command (in this case `sleep 5`), but have not found
>>>>>>>>>>>>>>>>>>>>>>>> any indication that it is being executed.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> In the mesos src code the health-checks are invoked
>>>>>>>>>>>>>>>>>>>>>>>> from src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does
>>>>>>>>>>>>>>>>>>>>>>>> this mean that health-checks are only supported for custom executors and
>>>>>>>>>>>>>>>>>>>>>>>> not for docker tasks?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> What I am trying to accomplish is to have the
>>>>>>>>>>>>>>>>>>>>>>>> 0/non-zero exit-status of a health-check command translate to task health.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Best Regards,
>>>>>>>>>> Haosdent Huang
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Regards,
>>>>>>>> Haosdent Huang
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards,
>>>>>>> Haosdent Huang
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards,
>>>>>> Haosdent Huang
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message