mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chawla,Sumit " <sumitkcha...@gmail.com>
Subject Re: Mesos Executor Failing
Date Tue, 23 May 2017 04:30:59 GMT
Hi Joseph

I am using 0.27.0.  Is there any diagnosis tool or command line that i can
run to ascertain that why its happening?

Regards
Sumit Chawla


On Fri, May 19, 2017 at 2:31 PM, Joseph Wu <joseph@mesosphere.io> wrote:

> What version of Mesos are you using?  (Just based on the word "slave" in
> that error message, I'm guessing 0.28 or older.)
>
> The "Failed to synchronize" error is something that can occur while the
> agent is launching the executor.  During the launch, the agent will create
> a pipe to the executor subprocess; and the executor makes a blocking read
> on this pipe.  The agent will write a value to the pipe to signal the
> executor to proceed.  If the agent restarts or the pipe breaks at this
> point in the launch, then you'll see this error message.
>
> On Thu, May 18, 2017 at 9:44 PM, Chawla,Sumit <sumitkchawla@gmail.com>
> wrote:
>
>> Hi
>>
>> I am facing a peculiar issue on one of the slave nodes of our cluster.  I
>> have a spark cluster with 40+ nodes.  On one of the nodes, all tasks fail
>> with exit code 0.
>>
>> ExecutorLostFailure (executor e6745c67-32e8-41ad-b6eb-8fa4d2539da7-S76
>> exited caused by one of the running tasks) Reason: Unknown executor exit
>> code (0)
>>
>>
>> I cannot seem to find anything in mesos-slave.logs, and there is nothing
>> being written to stdout/stderr.  Are there any debugging utitlities that i
>> can use to debug what can be getting wrong on that particular slave?
>>
>> I tried running following but got stuck at:
>>
>>
>> /mesos-containerizer launch --command='{"environment":{},"shell":true,"value":"ls
>> -ltr"}' --directory=/var/tmp/mesos/slaves/e6745c67-32e8-41ad-b6eb-
>> 8fa4d2539da7-S77/frameworks/e6745c67-32e8-41ad-b6eb-
>> 8fa4d2539da7-0312/executors/e6745c67-32e8-41ad-b6eb-
>> 8fa4d2539da7-S77/runs/45aa784c-f485-46a6-aeb8-997e82b80c4f --help=false
>> --pipe_read=0 --pipe_write=0 --user=smi
>>
>> Failed to synchronize with slave (it's probably exited)
>>
>>
>> Would apprecite pointing to any debugging methods/documentation to
>> diagnose these kind of problems.
>>
>> Regards
>> Sumit Chawla
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message