mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Taylor <outtat...@gmail.com>
Subject Re: Can health-checks be run by Mesos for docker tasks?
Date Fri, 09 Oct 2015 04:03:34 GMT
I see.  And then how are the flags sent to the executor?



> On Oct 8, 2015, at 8:56 PM, haosdent <haosdent@gmail.com> wrote:
> 
> Yes. The related code is located in https://github.com/apache/mesos/blob/master/src/slave/main.cpp#L123
> 
> In fact, environment variables starts with MESOS_ would load as flags variables.
> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/flags/flags.hpp#L52
> 
>> On Fri, Oct 9, 2015 at 11:33 AM, Jay Taylor <outtatime@gmail.com> wrote:
>> One question for you haosdent-
>> 
>> You mentioned that the flags.launcher_dir should propagate to the docker executor all the way up the chain.  Can you show me where this logic is in the codebase?  I didn't see where that was happening and would like to understand the mechanism.
>> 
>> Thanks!
>> Jay
>> 
>> 
>> 
>>> On Oct 8, 2015, at 8:29 PM, Jay Taylor <outtatime@gmail.com> wrote:
>>> 
>>> Maybe tomorrow I will build a fresh cluster from scratch to see if the broken behavior experienced today still persists.
>>> 
>>>> On Oct 8, 2015, at 7:52 PM, haosdent <haosdent@gmail.com> wrote:
>>>> 
>>>> As far as I know, MESOS_LAUNCHER_DIR is works by set flags.launcher_dir which would find mesos-docker-executor and mesos-health-check under this dir. Although the env is not propagated, but MESOS_LAUNCHER_DIR still works because flags.launcher_dir is get from it.
>>>> 
>>>> For example, because I 
>>>> ```
>>>> export MESOS_LAUNCHER_DIR=/tmp
>>>> ```
>>>> before start mesos-slave. So when I launch slave, I could find this log in slave log
>>>> ```
>>>> I1009 10:27:26.594599  1416 slave.cpp:203] Flags at startup: xxxxx  --launcher_dir="/tmp"
>>>> ```
>>>> 
>>>> And from your log, I not sure why your MESOS_LAUNCHER_DIR become sandbox dir. Is it because MESOS_LAUNCHER_DIR is overrided in your other scripts?
>>>> 
>>>> 
>>>>> On Fri, Oct 9, 2015 at 1:56 AM, Jay Taylor <outtatime@gmail.com> wrote:
>>>>> I haven't ever changed MESOS_LAUNCHER_DIR/--launcher_dir before.
>>>>> 
>>>>> I just tried setting both the env var and flag on the slaves, and have determined that the env var is not present when it is being checked src/docker/executor.cpp @ line 573:
>>>>> 
>>>>>>  const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
>>>>>>   string path =
>>>>>>     envPath.isSome() ? envPath.get()
>>>>>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>>>>>   cout << "MESOS_LAUNCHER_DIR: envpath.isSome()->" << (envPath.isSome() ? "yes" : "no") << endl;
>>>>>>   cout << "MESOS_LAUNCHER_DIR: path='" << path << "'" << endl;
>>>>> 
>>>>> 
>>>>> Exported MESOS_LAUNCHER_DIR env var (and verified it is correctly propagated along up to the point of mesos-slave launch):
>>>>> 
>>>>>> $ cat /etc/default/mesos-slave
>>>>>> export MESOS_MASTER="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
>>>>>> export MESOS_CONTAINERIZERS="mesos,docker"
>>>>>> export MESOS_EXECUTOR_REGISTRATION_TIMEOUT="5mins"
>>>>>> export MESOS_PORT="5050"
>>>>>> export MESOS_LAUNCHER_DIR="/usr/libexec/mesos"
>>>>> 
>>>>> TASK OUTPUT:
>>>>> 
>>>>>> MESOS_LAUNCHER_DIR: envpath.isSome()->no
>>>>>> MESOS_LAUNCHER_DIR: path='/tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad'
>>>>>> Registered docker executor on mesos-worker2a
>>>>>> Starting task hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>>>> Launching health check process: /tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad/mesos-health-check --executor=(1)@192.168.225.59:44523 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-61373c0e-7349-4173-ab8d-9d7b260e8a30-S1.41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad sh -c \" \/bin\/bash \""},"consecutive_failures":3,"delay_seconds":5.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0} --task_id=hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>>>> Health check process launched at pid: 2519
>>>>> 
>>>>> 
>>>>> The env var is not propagated when the docker executor is launched in src/slave/containerizer/docker.cpp around line 903:
>>>>> 
>>>>>>   vector<string> argv;
>>>>>>   argv.push_back("mesos-docker-executor");
>>>>>>   // Construct the mesos-docker-executor using the "name" we gave the
>>>>>>   // container (to distinguish it from Docker containers not created
>>>>>>   // by Mesos).
>>>>>>   Try<Subprocess> s = subprocess(
>>>>>>       path::join(flags.launcher_dir, "mesos-docker-executor"),
>>>>>>       argv,
>>>>>>       Subprocess::PIPE(),
>>>>>>       Subprocess::PATH(path::join(container->directory, "stdout")),
>>>>>>       Subprocess::PATH(path::join(container->directory, "stderr")),
>>>>>>       dockerFlags(flags, container->name(), container->directory),
>>>>>>       environment,
>>>>>>       lambda::bind(&setup, container->directory));
>>>>> 
>>>>> 
>>>>> A little ways above we can see the environment is setup w/ the container tasks defined env vars.
>>>>> 
>>>>> See src/slave/containerizer/docker.cpp around line 871:
>>>>> 
>>>>>>   // Include any enviroment variables from ExecutorInfo.
>>>>>>   foreach (const Environment::Variable& variable,
>>>>>>            container->executor.command().environment().variables()) {
>>>>>>     environment[variable.name()] = variable.value();
>>>>>>   }
>>>>> 
>>>>> 
>>>>> Should I file a JIRA for this?  Have I overlooked anything?
>>>>> 
>>>>> 
>>>>>> On Wed, Oct 7, 2015 at 8:11 PM, haosdent <haosdent@gmail.com> wrote:
>>>>>> >Not sure what was going on with health-checks in 0.24.0.
>>>>>> 0.24.1 should be works.
>>>>>> 
>>>>>> >Do any of you know which host the path "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check" should exist on? It definitely doesn't exist on the slave, hence execution failing.
>>>>>> 
>>>>>> Does you set MESOS_LAUNCHER_DIR/--launcher_dir incorrectly before? We got mesos-health-check from MESOS_LAUNCHER_DIR/--launcher_id or use the same dir of mesos-docker-executor. 
>>>>>> 
>>>>>>> On Thu, Oct 8, 2015 at 10:46 AM, Jay Taylor <outtatime@gmail.com> wrote:
>>>>>>> Maybe I spoke too soon.
>>>>>>> 
>>>>>>> Now the checks are attempting to run, however the STDERR is not looking good.  I've added some debugging to the error message output to show the path, argv, and envp variables:
>>>>>>> 
>>>>>>> STDOUT:
>>>>>>> 
>>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc" --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc" --stop_timeout="0ns"
>>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc" --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc" --stop_timeout="0ns"
>>>>>>>> Registered docker executor on mesos-worker2a
>>>>>>>> Starting task app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>>>> Launching health check process: /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check --executor=(1)@192.168.225.59:43917 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc sh -c \" exit 1 \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0} --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>>>> Health check process launched at pid: 3012
>>>>>>> 
>>>>>>> 
>>>>>>> STDERR:
>>>>>>> 
>>>>>>>> I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0
>>>>>>>> I1008 02:17:28.871860 2778 exec.cpp:208] Executor registered on slave 16b49e90-6852-4c91-8e70-d89c54f25668-S1
>>>>>>>> WARNING: Your kernel does not support swap limit capabilities, memory limited without swap.
>>>>>>>> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in childMain (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check', argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check', envp=''): No such file or directory*** Aborted at 1444270649 (unix time) try "date -d @1444270649" if you are using GNU date ***
>>>>>>>> PC: @ 0x7f4a37ec6cc9 (unknown)
>>>>>>>> *** SIGABRT (@0xbc4) received by PID 3012 (TID 0x7f4a2f9f6700) from PID 3012; stack trace: ***
>>>>>>>> @ 0x7f4a38265340 (unknown)
>>>>>>>> @ 0x7f4a37ec6cc9 (unknown)
>>>>>>>> @ 0x7f4a37eca0d8 (unknown)
>>>>>>>> @ 0x4191e2 _Abort()
>>>>>>>> @ 0x41921c _Abort()
>>>>>>>> @ 0x7f4a39dc2768 process::childMain()
>>>>>>>> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke()
>>>>>>>> @ 0x7f4a39dc24fc process::defaultClone()
>>>>>>>> @ 0x7f4a39dc34fb process::subprocess()
>>>>>>>> @ 0x43cc9c mesos::internal::docker::DockerExecutorProcess::launchHealthCheck()
>>>>>>>> @ 0x7f4a39d924f4 process::ProcessManager::resume()
>>>>>>>> @ 0x7f4a39d92827 _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
>>>>>>>> @ 0x7f4a38a47e40 (unknown)
>>>>>>>> @ 0x7f4a3825d182 start_thread
>>>>>>>> @ 0x7f4a37f8a47d (unknown)
>>>>>>> 
>>>>>>> Do any of you know which host the path "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check" should exist on? It definitely doesn't exist on the slave, hence execution failing.
>>>>>>> 
>>>>>>> This is with current master, git hash 5058fac1083dc91bca54d33c26c810c17ad95dd1.
>>>>>>> 
>>>>>>>> commit 5058fac1083dc91bca54d33c26c810c17ad95dd1
>>>>>>>> Author: Anand Mazumdar <mazumdar.anand@gmail.com>
>>>>>>>> Date:   Tue Oct 6 17:37:41 2015 -0700
>>>>>>> 
>>>>>>> 
>>>>>>> -Jay
>>>>>>> 
>>>>>>>> On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <outtatime@gmail.com> wrote:
>>>>>>>> Update:
>>>>>>>> 
>>>>>>>> I used https://github.com/deric/mesos-deb-packaging to compile and package the latest master (0.26.x) and deployed it to the cluster, and now health checks are working as advertised in both Marathon and my own framework!  Not sure what was going on with health-checks in 0.24.0..
>>>>>>>> 
>>>>>>>> Anyways, thanks again for your help Haosdent!
>>>>>>>> 
>>>>>>>> Cheers,
>>>>>>>> Jay
>>>>>>>> 
>>>>>>>>> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <outtatime@gmail.com> wrote:
>>>>>>>>> Hi Haosdent,
>>>>>>>>> 
>>>>>>>>> Can you share your Marathon POST request that results in Mesos executing the health checks?
>>>>>>>>> 
>>>>>>>>> Since we can reference the Marathon framework, I've been doing some digging around.
>>>>>>>>> 
>>>>>>>>> Here are the details of my setup and findings:
>>>>>>>>> 
>>>>>>>>> I put a few small hacks in Marathon:
>>>>>>>>> 
>>>>>>>>> (1) Added com.googlecode.protobuf.format to Marathon's dependencies
>>>>>>>>> 
>>>>>>>>> (2) Edited the following files so TaskInfo is dumped as JSON to /tmp/X in both the TaskFactory as well an right before the task is sent to Mesos via driver.launchTasks:
>>>>>>>>> 
>>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala:
>>>>>>>>> 
>>>>>>>>>> $ git diff src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala
>>>>>>>>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() (
>>>>>>>>>> 
>>>>>>>>>>      new TaskBuilder(app, taskIdUtil.newTaskId, config).buildIfMatches(offer, runningTasks).map {
>>>>>>>>>>        case (taskInfo, ports) =>
>>>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>>>> +        import java.io._
>>>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(new File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue)))
>>>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfo))
>>>>>>>>>> +        bw.write("\n")
>>>>>>>>>> +        bw.close()
>>>>>>>>>>          CreatedTask(
>>>>>>>>>>            taskInfo,
>>>>>>>>>>            MarathonTasks.makeTask(
>>>>>>>>> 
>>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala:
>>>>>>>>> 
>>>>>>>>>> $ git diff src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala
>>>>>>>>>> @@ -24,6 +24,16 @@ private[launcher] class TaskLauncherImpl(
>>>>>>>>>>    override def launchTasks(offerID: OfferID, taskInfos: Seq[TaskInfo]): Boolean = {
>>>>>>>>>>      val launched = withDriver(s"launchTasks($offerID)") { driver =>
>>>>>>>>>>        import scala.collection.JavaConverters._
>>>>>>>>>> +      var i = 0
>>>>>>>>>> +      for (i <- 0 to taskInfos.length - 1) {
>>>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>>>> +        import java.io._
>>>>>>>>>> +        val file = new File("/tmp/taskJson2-" + i.toString() + "-" + taskInfos(i).getTaskId.getValue)
>>>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(file))
>>>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfos(i)))
>>>>>>>>>> +        bw.write("\n")
>>>>>>>>>> +        bw.close()
>>>>>>>>>> +      }
>>>>>>>>>>        driver.launchTasks(Collections.singleton(offerID), taskInfos.asJava)
>>>>>>>>>>      }
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Then I built and deployed the hacked Marathon and restarted the marathon service.
>>>>>>>>> 
>>>>>>>>> Next I created the app via the Marathon API ("hello app" is a container with a simple hello-world ruby app running on 0.0.0.0:8000)
>>>>>>>>> 
>>>>>>>>>> curl http://mesos-primary1a:8080/v2/groups -XPOST -H'Content-Type: application/json' -d'
>>>>>>>>>> {
>>>>>>>>>>   "id": "/app-81-1-hello-app",
>>>>>>>>>>   "apps": [
>>>>>>>>>>     {
>>>>>>>>>>       "id": "/app-81-1-hello-app/web-v11",
>>>>>>>>>>       "container": {
>>>>>>>>>>         "type": "DOCKER",
>>>>>>>>>>         "docker": {
>>>>>>>>>>           "image": "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>>>           "network": "BRIDGE",
>>>>>>>>>>           "portMappings": [
>>>>>>>>>>             {
>>>>>>>>>>               "containerPort": 8000,
>>>>>>>>>>               "hostPort": 0,
>>>>>>>>>>               "protocol": "tcp"
>>>>>>>>>>             }
>>>>>>>>>>           ]
>>>>>>>>>>         }
>>>>>>>>>>       },
>>>>>>>>>>       "env": {
>>>>>>>>>>         
>>>>>>>>>>       },
>>>>>>>>>>       "healthChecks": [
>>>>>>>>>>         {
>>>>>>>>>>           "protocol": "COMMAND",
>>>>>>>>>>           "command": {"value": "exit 1"},
>>>>>>>>>>           "gracePeriodSeconds": 10,
>>>>>>>>>>           "intervalSeconds": 10,
>>>>>>>>>>           "timeoutSeconds": 10,
>>>>>>>>>>           "maxConsecutiveFailures": 3
>>>>>>>>>>         }
>>>>>>>>>>       ],
>>>>>>>>>>       "instances": 1,
>>>>>>>>>>       "cpus": 1,
>>>>>>>>>>       "mem": 512
>>>>>>>>>>     }
>>>>>>>>>>   ]
>>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> $ ls /tmp/
>>>>>>>>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>> 
>>>>>>>>> Do they match?
>>>>>>>>> 
>>>>>>>>>> $ md5sum /tmp/task*
>>>>>>>>>> 1b5115997e78e2611654059249d99578  /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>> 1b5115997e78e2611654059249d99578  /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>> 
>>>>>>>>> Yes, so I am confident this is the information being sent across the wire to Mesos.
>>>>>>>>> 
>>>>>>>>> Do they contain any health-check information?
>>>>>>>>> 
>>>>>>>>>> $ cat /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>> {
>>>>>>>>>>   "name":"web-v11.app-81-1-hello-app",
>>>>>>>>>>   "task_id":{
>>>>>>>>>>     "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>>>   },
>>>>>>>>>>   "slave_id":{
>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>   },
>>>>>>>>>>   "resources":[
>>>>>>>>>>     {
>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>       "type":"SCALAR",
>>>>>>>>>>       "scalar":{
>>>>>>>>>>         "value":1.0
>>>>>>>>>>       },
>>>>>>>>>>       "role":"*"
>>>>>>>>>>     },
>>>>>>>>>>     {
>>>>>>>>>>       "name":"mem",
>>>>>>>>>>       "type":"SCALAR",
>>>>>>>>>>       "scalar":{
>>>>>>>>>>         "value":512.0
>>>>>>>>>>       },
>>>>>>>>>>       "role":"*"
>>>>>>>>>>     },
>>>>>>>>>>     {
>>>>>>>>>>       "name":"ports",
>>>>>>>>>>       "type":"RANGES",
>>>>>>>>>>       "ranges":{
>>>>>>>>>>         "range":[
>>>>>>>>>>           {
>>>>>>>>>>             "begin":31641,
>>>>>>>>>>             "end":31641
>>>>>>>>>>           }
>>>>>>>>>>         ]
>>>>>>>>>>       },
>>>>>>>>>>       "role":"*"
>>>>>>>>>>     }
>>>>>>>>>>   ],
>>>>>>>>>>   "command":{
>>>>>>>>>>     "environment":{
>>>>>>>>>>       "variables":[
>>>>>>>>>>         {
>>>>>>>>>>           "name":"PORT_8000",
>>>>>>>>>>           "value":"31641"
>>>>>>>>>>         },
>>>>>>>>>>         {
>>>>>>>>>>           "name":"MARATHON_APP_VERSION",
>>>>>>>>>>           "value":"2015-10-07T19:35:08.386Z"
>>>>>>>>>>         },
>>>>>>>>>>         {
>>>>>>>>>>           "name":"HOST",
>>>>>>>>>>           "value":"mesos-worker1a"
>>>>>>>>>>         },
>>>>>>>>>>         {
>>>>>>>>>>           "name":"MARATHON_APP_DOCKER_IMAGE",
>>>>>>>>>>           "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966"
>>>>>>>>>>         },
>>>>>>>>>>         {
>>>>>>>>>>           "name":"MESOS_TASK_ID",
>>>>>>>>>>           "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>>>         },
>>>>>>>>>>         {
>>>>>>>>>>           "name":"PORT",
>>>>>>>>>>           "value":"31641"
>>>>>>>>>>         },
>>>>>>>>>>         {
>>>>>>>>>>           "name":"PORTS",
>>>>>>>>>>           "value":"31641"
>>>>>>>>>>         },
>>>>>>>>>>         {
>>>>>>>>>>           "name":"MARATHON_APP_ID",
>>>>>>>>>>           "value":"/app-81-1-hello-app/web-v11"
>>>>>>>>>>         },
>>>>>>>>>>         {
>>>>>>>>>>           "name":"PORT0",
>>>>>>>>>>           "value":"31641"
>>>>>>>>>>         }
>>>>>>>>>>       ]
>>>>>>>>>>     },
>>>>>>>>>>     "shell":false
>>>>>>>>>>   },
>>>>>>>>>>   "container":{
>>>>>>>>>>     "type":"DOCKER",
>>>>>>>>>>     "docker":{
>>>>>>>>>>       "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>>>       "network":"BRIDGE",
>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>         {
>>>>>>>>>>           "host_port":31641,
>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>         }
>>>>>>>>>>       ],
>>>>>>>>>>       "privileged":false,
>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>     }
>>>>>>>>>>   }
>>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> No, I don't see anything about any health check.
>>>>>>>>> 
>>>>>>>>> Mesos STDOUT for the launched task:
>>>>>>>>> 
>>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da" --docker="docker" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da" --stop_timeout="0ns"
>>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da" --docker="docker" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da" --stop_timeout="0ns"
>>>>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>>>>> Starting task app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> And STDERR:
>>>>>>>>> 
>>>>>>>>>> I1007 19:35:08.790743  4612 exec.cpp:134] Version: 0.24.0
>>>>>>>>>> I1007 19:35:08.793416  4619 exec.cpp:208] Executor registered on slave 20150924-210922-1608624320-5050-1792-S1
>>>>>>>>>> WARNING: Your kernel does not support swap limit capabilities, memory limited without swap.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Again, nothing about any health checks.
>>>>>>>>> 
>>>>>>>>> Any ideas of other things to try or what I could be missing?  Can't say either way about the Mesos health-check system working or not if Marathon won't put the health-check into the task it sends to Mesos.
>>>>>>>>> 
>>>>>>>>> Thanks for all your help!
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Jay
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <haosdent@gmail.com> wrote:
>>>>>>>>>> Maybe you could post your executor stdout/stderr so that we could know whether health check running not.
>>>>>>>>>> 
>>>>>>>>>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <haosdent@gmail.com> wrote:
>>>>>>>>>>> marathon also use mesos health check. When I use health check, I could saw the log like this in executor stdout.
>>>>>>>>>>> 
>>>>>>>>>>> ```
>>>>>>>>>>> Registered docker executor on xxxxx
>>>>>>>>>>> Starting task test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000
>>>>>>>>>>> Launching health check process: /home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx
>>>>>>>>>>> Health check process launched at pid: 9895
>>>>>>>>>>> Received task health update, healthy: true
>>>>>>>>>>> ```
>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <outtatime@gmail.com> wrote:
>>>>>>>>>>>> I am using my own framework, and the full task info I'm using is posted earlier in this thread.  Do you happen to know if Marathon uses Mesos's health checks for its health check system?
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <haosdent@gmail.com> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Yes, launch the health task through its definition in taskinfo. Do you launch your task through Marathon? I could test it in my side.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <outtatime@gmail.com> wrote:
>>>>>>>>>>>>>> Precisely, and there are none of those statements.  Are you or others confident health-checks are part of the code path when defined via task info for docker container tasks?  Going through the code, I wasn't able to find the linkage for anything other than health-checks triggered through a custom executor.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> With that being said it is a pretty good sized code base and I'm not very familiar with it, so my analysis this far has by no means been exhaustive.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> When health check launch, it would have a log like this in your executor stdout
>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>> Health check process launched at pid xxx
>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>> I'm happy to try this, however wouldn't there be output in the logs with the string "health" or "Health" if the health-check were active?  None of my master or slave logs contain the string..
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Could you use "exit 1" instead of "sleep 5" to see whether could see unhealthy status in your task stdout/stderr.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> My current version is 0.24.1.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1 https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0 https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>>>>>>>>>>>>>>>> Are you use one of this version?
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport, let me double check.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>> Oops- Now I see you already said it's in master.  I'll look there :)
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Thanks again!
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <jay@jaytaylor.com> wrote:
>>>>>>>>>>>>>>>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Do you know if there is a branch I can checkout to test it out?
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <tim@mesosphere.io> wrote:
>>>>>>>>>>>>>>>>>>>>>>> Hi Jay, 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> We just added health check support for docker tasks that's in master but not yet released. It will run docker exec with the command you provided as health checks.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> It should be in the next release.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Tim
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Does Mesos support health checks for docker image tasks?  Mesos seems to be ignoring the TaskInfo.HealthCheck field for me.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>>>>>>>>>>>>     "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>>>>>         "value":256
>>>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>>>>>>>>>>>>       "type":1,
>>>>>>>>>>>>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>>>>>>>>>>>>>>>>             "end":31002
>>>>>>>>>>>>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>>>>>>>>>>>>     "container":{
>>>>>>>>>>>>>>>>>>>>>>>>>>       "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>>>>>>>>>>>>     "type":1,
>>>>>>>>>>>>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>>>>>>>>>>>>       "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>>>>>>>>>>>>>>>>       "network":2,
>>>>>>>>>>>>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>>>>>>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>>>>>>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>>>>>>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>>>>>>>>>>>>>>>>     "command":{
>>>>>>>>>>>>>>>>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>>>>>>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>>>>>>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> I have searched all machines and containers to see if they ever run the command (in this case `sleep 5`), but have not found any indication that it is being executed.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> In the mesos src code the health-checks are invoked from src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does this mean that health-checks are only supported for custom executors and not for docker tasks?
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> What I am trying to accomplish is to have the 0/non-zero exit-status of a health-check command translate to task health.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -- 
>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> -- 
>>>>>>>>>>> Best Regards,
>>>>>>>>>>> Haosdent Huang
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> -- 
>>>>>>>>>> Best Regards,
>>>>>>>>>> Haosdent Huang
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -- 
>>>>>> Best Regards,
>>>>>> Haosdent Huang
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Best Regards,
>>>> Haosdent Huang
> 
> 
> 
> -- 
> Best Regards,
> Haosdent Huang

Mime
View raw message