mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From haosdent <haosd...@gmail.com>
Subject Re: Can health-checks be run by Mesos for docker tasks?
Date Fri, 09 Oct 2015 02:52:14 GMT
As far as I know, MESOS_LAUNCHER_DIR is works by set flags.launcher_dir
which would find mesos-docker-executor and mesos-health-check under this
dir. Although the env is not propagated, but MESOS_LAUNCHER_DIR still works
because flags.launcher_dir is get from it.

For example, because I
```
export MESOS_LAUNCHER_DIR=/tmp
```
before start mesos-slave. So when I launch slave, I could find this log in
slave log
```
I1009 10:27:26.594599  1416 slave.cpp:203] Flags at startup:
xxxxx  --launcher_dir="/tmp"
```

And from your log, I not sure why your MESOS_LAUNCHER_DIR become sandbox
dir. Is it because MESOS_LAUNCHER_DIR is overrided in your other scripts?


On Fri, Oct 9, 2015 at 1:56 AM, Jay Taylor <outtatime@gmail.com> wrote:

> I haven't ever changed MESOS_LAUNCHER_DIR/--launcher_dir before.
>
> I just tried setting both the env var and flag on the slaves, and have
> determined that the env var is not present when it is being checked
> src/docker/executor.cpp @ line 573:
>
>  const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
>>   string path =
>>     envPath.isSome() ? envPath.get()
>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>   cout << "MESOS_LAUNCHER_DIR: envpath.isSome()->" << (envPath.isSome() ?
>> "yes" : "no") << endl;
>>   cout << "MESOS_LAUNCHER_DIR: path='" << path << "'" << endl;
>
>
> Exported MESOS_LAUNCHER_DIR env var (and verified it is correctly
> propagated along up to the point of mesos-slave launch):
>
> $ cat /etc/default/mesos-slave
>> export
>> MESOS_MASTER="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
>> export MESOS_CONTAINERIZERS="mesos,docker"
>> export MESOS_EXECUTOR_REGISTRATION_TIMEOUT="5mins"
>> export MESOS_PORT="5050"
>> export MESOS_LAUNCHER_DIR="/usr/libexec/mesos"
>
>
> TASK OUTPUT:
>
>
>> *MESOS_LAUNCHER_DIR: envpath.isSome()->no**MESOS_LAUNCHER_DIR:
>> path='/tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad'*
>> Registered docker executor on mesos-worker2a
>> Starting task hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>> Launching health check process:
>> /tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad/mesos-health-check
>> --executor=(1)@192.168.225.59:44523
>> --health_check_json={"command":{"shell":true,"value":"docker exec
>> mesos-61373c0e-7349-4173-ab8d-9d7b260e8a30-S1.41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad
>> sh -c \" \/bin\/bash
>> \""},"consecutive_failures":3,"delay_seconds":5.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>> --task_id=hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>> Health check process launched at pid: 2519
>
>
> The env var is not propagated when the docker executor is launched
> in src/slave/containerizer/docker.cpp around line 903:
>
>   vector<string> argv;
>>   argv.push_back("mesos-docker-executor");
>>   // Construct the mesos-docker-executor using the "name" we gave the
>>   // container (to distinguish it from Docker containers not created
>>   // by Mesos).
>>   Try<Subprocess> s = subprocess(
>>       path::join(flags.launcher_dir, "mesos-docker-executor"),
>>       argv,
>>       Subprocess::PIPE(),
>>       Subprocess::PATH(path::join(container->directory, "stdout")),
>>       Subprocess::PATH(path::join(container->directory, "stderr")),
>>       dockerFlags(flags, container->name(), container->directory),
>>       environment,
>>       lambda::bind(&setup, container->directory));
>
>
> A little ways above we can see the environment is setup w/ the container
> tasks defined env vars.
>
> See src/slave/containerizer/docker.cpp around line 871:
>
>   // Include any enviroment variables from ExecutorInfo.
>>   foreach (const Environment::Variable& variable,
>>            container->executor.command().environment().variables()) {
>>     environment[variable.name()] = variable.value();
>>   }
>
>
> Should I file a JIRA for this?  Have I overlooked anything?
>
>
> On Wed, Oct 7, 2015 at 8:11 PM, haosdent <haosdent@gmail.com> wrote:
>
>> >Not sure what was going on with health-checks in 0.24.0.
>> 0.24.1 should be works.
>>
>> >Do any of you know which host the path
>> "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>> should exist on? It definitely doesn't exist on the slave, hence execution
>> failing.
>>
>> Does you set MESOS_LAUNCHER_DIR/--launcher_dir incorrectly before? We got
>> mesos-health-check from MESOS_LAUNCHER_DIR/--launcher_id or use the same
>> dir of mesos-docker-executor.
>>
>> On Thu, Oct 8, 2015 at 10:46 AM, Jay Taylor <outtatime@gmail.com> wrote:
>>
>>> Maybe I spoke too soon.
>>>
>>> Now the checks are attempting to run, however the STDERR is not looking
>>> good.  I've added some debugging to the error message output to show the
>>> path, argv, and envp variables:
>>>
>>> STDOUT:
>>>
>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>> --stop_timeout="0ns"
>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>> --stop_timeout="0ns"
>>>> Registered docker executor on mesos-worker2a
>>>> Starting task
>>>> app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>> Launching health check process:
>>>> /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check
>>>> --executor=(1)@192.168.225.59:43917
>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>> mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc
>>>> sh -c \" exit 1
>>>> \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>> --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>> Health check process launched at pid: 3012
>>>
>>>
>>> STDERR:
>>>
>>> I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0
>>>> I1008 02:17:28.871860 2778 exec.cpp:208] Executor registered on slave
>>>> 16b49e90-6852-4c91-8e70-d89c54f25668-S1
>>>> WARNING: Your kernel does not support swap limit capabilities, memory
>>>> limited without swap.
>>>> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in childMain
>>>> (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>> argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>> envp=''): No such file or directory*** Aborted at 1444270649 (unix time)
>>>> try "date -d @1444270649" if you are using GNU date ***
>>>> PC: @ 0x7f4a37ec6cc9 (unknown)
>>>> *** SIGABRT (@0xbc4) received by PID 3012 (TID 0x7f4a2f9f6700) from PID
>>>> 3012; stack trace: ***
>>>> @ 0x7f4a38265340 (unknown)
>>>> @ 0x7f4a37ec6cc9 (unknown)
>>>> @ 0x7f4a37eca0d8 (unknown)
>>>> @ 0x4191e2 _Abort()
>>>> @ 0x41921c _Abort()
>>>> @ 0x7f4a39dc2768 process::childMain()
>>>> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke()
>>>> @ 0x7f4a39dc24fc process::defaultClone()
>>>> @ 0x7f4a39dc34fb process::subprocess()
>>>> @ 0x43cc9c
>>>> mesos::internal::docker::DockerExecutorProcess::launchHealthCheck()
>>>> @ 0x7f4a39d924f4 process::ProcessManager::resume()
>>>> @ 0x7f4a39d92827
>>>> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
>>>> @ 0x7f4a38a47e40 (unknown)
>>>> @ 0x7f4a3825d182 start_thread
>>>> @ 0x7f4a37f8a47d (unknown)
>>>
>>>
>>> Do any of you know which host the path "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>> should exist on? It definitely doesn't exist on the slave, hence
>>> execution failing.
>>>
>>> This is with current master, git hash
>>> 5058fac1083dc91bca54d33c26c810c17ad95dd1.
>>>
>>> commit 5058fac1083dc91bca54d33c26c810c17ad95dd1
>>>> Author: Anand Mazumdar <mazumdar.anand@gmail.com>
>>>> Date:   Tue Oct 6 17:37:41 2015 -0700
>>>
>>>
>>> -Jay
>>>
>>> On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <outtatime@gmail.com> wrote:
>>>
>>>> Update:
>>>>
>>>> I used https://github.com/deric/mesos-deb-packaging to compile and
>>>> package the latest master (0.26.x) and deployed it to the cluster, and now
>>>> health checks are working as advertised in both Marathon and my own
>>>> framework!  Not sure what was going on with health-checks in 0.24.0..
>>>>
>>>> Anyways, thanks again for your help Haosdent!
>>>>
>>>> Cheers,
>>>> Jay
>>>>
>>>> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <outtatime@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Haosdent,
>>>>>
>>>>> Can you share your Marathon POST request that results in Mesos
>>>>> executing the health checks?
>>>>>
>>>>> Since we can reference the Marathon framework, I've been doing some
>>>>> digging around.
>>>>>
>>>>> Here are the details of my setup and findings:
>>>>>
>>>>> I put a few small hacks in Marathon:
>>>>>
>>>>> (1) Added com.googlecode.protobuf.format to Marathon's dependencies
>>>>>
>>>>> (2) Edited the following files so TaskInfo is dumped as JSON to /tmp/X
>>>>> in both the TaskFactory as well an right before the task is sent to Mesos
>>>>> via driver.launchTasks:
>>>>>
>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala:
>>>>>
>>>>> $ git diff
>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala
>>>>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() (
>>>>>>
>>>>>>      new TaskBuilder(app, taskIdUtil.newTaskId,
>>>>>> config).buildIfMatches(offer, runningTasks).map {
>>>>>>        case (taskInfo, ports) =>
>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>> +        import java.io._
>>>>>> +        val bw = new BufferedWriter(new FileWriter(new
>>>>>> File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue)))
>>>>>> +        bw.write(JsonFormat.printToString(taskInfo))
>>>>>> +        bw.write("\n")
>>>>>> +        bw.close()
>>>>>>          CreatedTask(
>>>>>>            taskInfo,
>>>>>>            MarathonTasks.makeTask(
>>>>>
>>>>>
>>>>>
>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala:
>>>>>
>>>>> $ git diff
>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala
>>>>>> @@ -24,6 +24,16 @@ private[launcher] class TaskLauncherImpl(
>>>>>>    override def launchTasks(offerID: OfferID, taskInfos:
>>>>>> Seq[TaskInfo]): Boolean = {
>>>>>>      val launched = withDriver(s"launchTasks($offerID)") { driver =>
>>>>>>        import scala.collection.JavaConverters._
>>>>>> +      var i = 0
>>>>>> +      for (i <- 0 to taskInfos.length - 1) {
>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>> +        import java.io._
>>>>>> +        val file = new File("/tmp/taskJson2-" + i.toString() + "-" +
>>>>>> taskInfos(i).getTaskId.getValue)
>>>>>> +        val bw = new BufferedWriter(new FileWriter(file))
>>>>>> +        bw.write(JsonFormat.printToString(taskInfos(i)))
>>>>>> +        bw.write("\n")
>>>>>> +        bw.close()
>>>>>> +      }
>>>>>>        driver.launchTasks(Collections.singleton(offerID),
>>>>>> taskInfos.asJava)
>>>>>>      }
>>>>>
>>>>>
>>>>> Then I built and deployed the hacked Marathon and restarted the
>>>>> marathon service.
>>>>>
>>>>> Next I created the app via the Marathon API ("hello app" is a
>>>>> container with a simple hello-world ruby app running on 0.0.0.0:8000)
>>>>>
>>>>> curl http://mesos-primary1a:8080/v2/groups -XPOST -H'Content-Type:
>>>>>> application/json' -d'
>>>>>> {
>>>>>>   "id": "/app-81-1-hello-app",
>>>>>>   "apps": [
>>>>>>     {
>>>>>>       "id": "/app-81-1-hello-app/web-v11",
>>>>>>       "container": {
>>>>>>         "type": "DOCKER",
>>>>>>         "docker": {
>>>>>>           "image":
>>>>>> "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>           "network": "BRIDGE",
>>>>>>           "portMappings": [
>>>>>>             {
>>>>>>               "containerPort": 8000,
>>>>>>               "hostPort": 0,
>>>>>>               "protocol": "tcp"
>>>>>>             }
>>>>>>           ]
>>>>>>         }
>>>>>>       },
>>>>>>       "env": {
>>>>>>
>>>>>>       },
>>>>>>       "healthChecks": [
>>>>>>         {
>>>>>>           "protocol": "COMMAND",
>>>>>>           "command": {"value": "exit 1"},
>>>>>>           "gracePeriodSeconds": 10,
>>>>>>           "intervalSeconds": 10,
>>>>>>           "timeoutSeconds": 10,
>>>>>>           "maxConsecutiveFailures": 3
>>>>>>         }
>>>>>>       ],
>>>>>>       "instances": 1,
>>>>>>       "cpus": 1,
>>>>>>       "mem": 512
>>>>>>     }
>>>>>>   ]
>>>>>> }
>>>>>
>>>>>
>>>>> $ ls /tmp/
>>>>>>
>>>>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>
>>>>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>
>>>>>
>>>>> Do they match?
>>>>>
>>>>> $ md5sum /tmp/task*
>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>  /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>  /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>
>>>>>
>>>>> Yes, so I am confident this is the information being sent across the
>>>>> wire to Mesos.
>>>>>
>>>>> Do they contain any health-check information?
>>>>>
>>>>> $ cat
>>>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>> {
>>>>>>   "name":"web-v11.app-81-1-hello-app",
>>>>>>   "task_id":{
>>>>>>
>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>   },
>>>>>>   "slave_id":{
>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>   },
>>>>>>   "resources":[
>>>>>>     {
>>>>>>       "name":"cpus",
>>>>>>       "type":"SCALAR",
>>>>>>       "scalar":{
>>>>>>         "value":1.0
>>>>>>       },
>>>>>>       "role":"*"
>>>>>>     },
>>>>>>     {
>>>>>>       "name":"mem",
>>>>>>       "type":"SCALAR",
>>>>>>       "scalar":{
>>>>>>         "value":512.0
>>>>>>       },
>>>>>>       "role":"*"
>>>>>>     },
>>>>>>     {
>>>>>>       "name":"ports",
>>>>>>       "type":"RANGES",
>>>>>>       "ranges":{
>>>>>>         "range":[
>>>>>>           {
>>>>>>             "begin":31641,
>>>>>>             "end":31641
>>>>>>           }
>>>>>>         ]
>>>>>>       },
>>>>>>       "role":"*"
>>>>>>     }
>>>>>>   ],
>>>>>>   "command":{
>>>>>>     "environment":{
>>>>>>       "variables":[
>>>>>>         {
>>>>>>           "name":"PORT_8000",
>>>>>>           "value":"31641"
>>>>>>         },
>>>>>>         {
>>>>>>           "name":"MARATHON_APP_VERSION",
>>>>>>           "value":"2015-10-07T19:35:08.386Z"
>>>>>>         },
>>>>>>         {
>>>>>>           "name":"HOST",
>>>>>>           "value":"mesos-worker1a"
>>>>>>         },
>>>>>>         {
>>>>>>           "name":"MARATHON_APP_DOCKER_IMAGE",
>>>>>>
>>>>>> "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966"
>>>>>>         },
>>>>>>         {
>>>>>>           "name":"MESOS_TASK_ID",
>>>>>>
>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>         },
>>>>>>         {
>>>>>>           "name":"PORT",
>>>>>>           "value":"31641"
>>>>>>         },
>>>>>>         {
>>>>>>           "name":"PORTS",
>>>>>>           "value":"31641"
>>>>>>         },
>>>>>>         {
>>>>>>           "name":"MARATHON_APP_ID",
>>>>>>           "value":"/app-81-1-hello-app/web-v11"
>>>>>>         },
>>>>>>         {
>>>>>>           "name":"PORT0",
>>>>>>           "value":"31641"
>>>>>>         }
>>>>>>       ]
>>>>>>     },
>>>>>>     "shell":false
>>>>>>   },
>>>>>>   "container":{
>>>>>>     "type":"DOCKER",
>>>>>>     "docker":{
>>>>>>
>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>       "network":"BRIDGE",
>>>>>>       "port_mappings":[
>>>>>>         {
>>>>>>           "host_port":31641,
>>>>>>           "container_port":8000,
>>>>>>           "protocol":"tcp"
>>>>>>         }
>>>>>>       ],
>>>>>>       "privileged":false,
>>>>>>       "force_pull_image":false
>>>>>>     }
>>>>>>   }
>>>>>> }
>>>>>
>>>>>
>>>>> No, I don't see anything about any health check.
>>>>>
>>>>> Mesos STDOUT for the launched task:
>>>>>
>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>> --stop_timeout="0ns"
>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>> --stop_timeout="0ns"
>>>>>> Registered docker executor on mesos-worker1a
>>>>>> Starting task
>>>>>> app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>
>>>>>
>>>>> And STDERR:
>>>>>
>>>>> I1007 19:35:08.790743  4612 exec.cpp:134] Version: 0.24.0
>>>>>> I1007 19:35:08.793416  4619 exec.cpp:208] Executor registered on
>>>>>> slave 20150924-210922-1608624320-5050-1792-S1
>>>>>> WARNING: Your kernel does not support swap limit capabilities, memory
>>>>>> limited without swap.
>>>>>
>>>>>
>>>>> Again, nothing about any health checks.
>>>>>
>>>>> Any ideas of other things to try or what I could be missing?  Can't
>>>>> say either way about the Mesos health-check system working or not if
>>>>> Marathon won't put the health-check into the task it sends to Mesos.
>>>>>
>>>>> Thanks for all your help!
>>>>>
>>>>> Best,
>>>>> Jay
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <haosdent@gmail.com> wrote:
>>>>>
>>>>>> Maybe you could post your executor stdout/stderr so that we could
>>>>>> know whether health check running not.
>>>>>>
>>>>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <haosdent@gmail.com> wrote:
>>>>>>
>>>>>>> marathon also use mesos health check. When I use health check, I
>>>>>>> could saw the log like this in executor stdout.
>>>>>>>
>>>>>>> ```
>>>>>>> Registered docker executor on xxxxx
>>>>>>> Starting task test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000
>>>>>>> Launching health check process:
>>>>>>> /home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx
>>>>>>> Health check process launched at pid: 9895
>>>>>>> Received task health update, healthy: true
>>>>>>> ```
>>>>>>>
>>>>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <outtatime@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I am using my own framework, and the full task info I'm using is
>>>>>>>> posted earlier in this thread.  Do you happen to know if Marathon uses
>>>>>>>> Mesos's health checks for its health check system?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <haosdent@gmail.com> wrote:
>>>>>>>>
>>>>>>>> Yes, launch the health task through its definition in taskinfo. Do
>>>>>>>> you launch your task through Marathon? I could test it in my side.
>>>>>>>>
>>>>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <outtatime@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Precisely, and there are none of those statements.  Are you or
>>>>>>>>> others confident health-checks are part of the code path when defined via
>>>>>>>>> task info for docker container tasks?  Going through the code, I wasn't
>>>>>>>>> able to find the linkage for anything other than health-checks triggered
>>>>>>>>> through a custom executor.
>>>>>>>>>
>>>>>>>>> With that being said it is a pretty good sized code base and I'm
>>>>>>>>> not very familiar with it, so my analysis this far has by no means been
>>>>>>>>> exhaustive.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <haosdent@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> When health check launch, it would have a log like this in your
>>>>>>>>> executor stdout
>>>>>>>>> ```
>>>>>>>>> Health check process launched at pid xxx
>>>>>>>>> ```
>>>>>>>>>
>>>>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <outtatime@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I'm happy to try this, however wouldn't there be output in the
>>>>>>>>>> logs with the string "health" or "Health" if the health-check were active?
>>>>>>>>>> None of my master or slave logs contain the string..
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <haosdent@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Could you use "exit 1" instead of "sleep 5" to see whether could
>>>>>>>>>> see unhealthy status in your task stdout/stderr.
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <outtatime@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> My current version is 0.24.1.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <haosdent@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1
>>>>>>>>>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0
>>>>>>>>>>>>
>>>>>>>>>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>>>>>>>>> Are you use one of this version?
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <haosdent@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport, let me
>>>>>>>>>>>>> double check.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <
>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Oops- Now I see you already said it's in master.  I'll look
>>>>>>>>>>>>>> there :)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks again!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <jay@jaytaylor.com
>>>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Do you know if there is a branch I can checkout to test it
>>>>>>>>>>>>>>> out?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <
>>>>>>>>>>>>>>> tim@mesosphere.io> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Jay,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We just added health check support for docker tasks that's
>>>>>>>>>>>>>>>> in master but not yet released. It will run docker exec with the command
>>>>>>>>>>>>>>>> you provided as health checks.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It should be in the next release.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Tim
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <outtatime@gmail.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Does Mesos support health checks for docker image tasks?
>>>>>>>>>>>>>>>> Mesos seems to be ignoring the TaskInfo.HealthCheck field for me.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>         "value":256
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       "type":1,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>             "end":31002
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     "container":{
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     "type":1,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       "network":2,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     "command":{
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I have searched all machines and containers to see if they
>>>>>>>>>>>>>>>> ever run the command (in this case `sleep 5`), but have not found any
>>>>>>>>>>>>>>>> indication that it is being executed.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In the mesos src code the health-checks are invoked from
>>>>>>>>>>>>>>>> src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does this
>>>>>>>>>>>>>>>> mean that health-checks are only supported for custom executors and not for
>>>>>>>>>>>>>>>> docker tasks?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> What I am trying to accomplish is to have the 0/non-zero
>>>>>>>>>>>>>>>> exit-status of a health-check command translate to task health.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Best Regards,
>>>>>>>>>> Haosdent Huang
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best Regards,
>>>>>>>>> Haosdent Huang
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Regards,
>>>>>>>> Haosdent Huang
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards,
>>>>>>> Haosdent Huang
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards,
>>>>>> Haosdent Huang
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>
>


-- 
Best Regards,
Haosdent Huang

Mime
View raw message