mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chawla,Sumit " <sumitkcha...@gmail.com>
Subject Re: Mesos Executor Failing
Date Tue, 06 Jun 2017 02:13:45 GMT
Hi Joseph

The error code is being reported as 0, and there is not much else in the
logs.

Regards
Sumit Chawla


On Wed, May 24, 2017 at 12:21 AM, Joseph Wu <joseph@mesosphere.io> wrote:

> There isn't a tool for this.  Can you check if the Mesos agent is being
> restarted (or crashing) when you launch a task?  And perhaps upload some
> logs around the time of the task launch.
>
> There is a mismatch between the exit codes you've reported though.  When
> you see that log line in the sandbox logs, the exit code will be "1"
> (failure), rather than "0" (success).
>
> On Mon, May 22, 2017 at 9:30 PM, Chawla,Sumit <sumitkchawla@gmail.com>
> wrote:
>
>> Hi Joseph
>>
>> I am using 0.27.0.  Is there any diagnosis tool or command line that i
>> can run to ascertain that why its happening?
>>
>> Regards
>> Sumit Chawla
>>
>>
>> On Fri, May 19, 2017 at 2:31 PM, Joseph Wu <joseph@mesosphere.io> wrote:
>>
>>> What version of Mesos are you using?  (Just based on the word "slave" in
>>> that error message, I'm guessing 0.28 or older.)
>>>
>>> The "Failed to synchronize" error is something that can occur while the
>>> agent is launching the executor.  During the launch, the agent will create
>>> a pipe to the executor subprocess; and the executor makes a blocking read
>>> on this pipe.  The agent will write a value to the pipe to signal the
>>> executor to proceed.  If the agent restarts or the pipe breaks at this
>>> point in the launch, then you'll see this error message.
>>>
>>> On Thu, May 18, 2017 at 9:44 PM, Chawla,Sumit <sumitkchawla@gmail.com>
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> I am facing a peculiar issue on one of the slave nodes of our cluster.
>>>> I have a spark cluster with 40+ nodes.  On one of the nodes, all tasks fail
>>>> with exit code 0.
>>>>
>>>> ExecutorLostFailure (executor e6745c67-32e8-41ad-b6eb-8fa4d2539da7-S76
>>>> exited caused by one of the running tasks) Reason: Unknown executor
>>>> exit code (0)
>>>>
>>>>
>>>> I cannot seem to find anything in mesos-slave.logs, and there is
>>>> nothing being written to stdout/stderr.  Are there any debugging utitlities
>>>> that i can use to debug what can be getting wrong on that particular slave?
>>>>
>>>>
>>>> I tried running following but got stuck at:
>>>>
>>>>
>>>> /mesos-containerizer launch --command='{"environment":{},"shell":true,"value":"ls
>>>> -ltr"}' --directory=/var/tmp/mesos/slaves/e6745c67-32e8-41ad-b6eb-8f
>>>> a4d2539da7-S77/frameworks/e6745c67-32e8-41ad-b6eb-8fa4d2539d
>>>> a7-0312/executors/e6745c67-32e8-41ad-b6eb-8fa4d2539da7-S77/
>>>> runs/45aa784c-f485-46a6-aeb8-997e82b80c4f --help=false --pipe_read=0
>>>> --pipe_write=0 --user=smi
>>>>
>>>> Failed to synchronize with slave (it's probably exited)
>>>>
>>>>
>>>> Would apprecite pointing to any debugging methods/documentation to
>>>> diagnose these kind of problems.
>>>>
>>>> Regards
>>>> Sumit Chawla
>>>>
>>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message