Mailing-List: contact user-help@mesos.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@mesos.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CABXxnLv167Xx8JN45yAGe8aCyFJ3TFK+qugB=ALGE5Q_HZV0xQ@mail.gmail.com>
References: 
 <CABXxnLtcJMVfa=YvWeD=9XUCoQVhdnLiAseWOWL-2OFj0WqH2A@mail.gmail.com>
 <CAMUR=Wgpf6UhY=-JktQRj0HhBTdokLUzwKrWDoPqv=h_h54LZA@mail.gmail.com>
 <CANntcz4JoTP760WUE2As5DVRJ6N0FeNzJxa5nRvMH=HZDxN-Ow@mail.gmail.com>
 <CABXxnLtrTM+PNHBiFvwDsHmayk+12M5f2ft86QL77q6A5-nXzQ@mail.gmail.com>
 <CAFp_Nis6X63ZR9rbkhN2+xHZVwgiui7Y5u0-wdnJPLy+MTnTow@mail.gmail.com>
 <CABXxnLs=g2mkhsOZ4UmGY_EFUs2WcjNsDGyOykzY4etXkc4LEg@mail.gmail.com>
 <CABXxnLu0w5eEhVwOfnCV=ys8vZSBHLX1M=z6+OG7xGdAS7wDDA@mail.gmail.com>
 <CAFp_NivP8m6VtEkq0k2VpZeRNtA1avbcZBkibjKEQgDsZ9bm=w@mail.gmail.com>
 <CABXxnLv167Xx8JN45yAGe8aCyFJ3TFK+qugB=ALGE5Q_HZV0xQ@mail.gmail.com>
From: Benjamin Mahler <bmahler@apache.org>
Date: Mon, 7 Mar 2016 17:54:24 -0800
Message-ID: 
 <CAFp_NisUX3euYh3h0EZ1232qj_P25rGmgaBqbHj7TAznJp=crg@mail.gmail.com>
Subject: Re: mesos agent not recovering after ZK init failure
To: user@mesos.apache.org
Content-Type: multipart/alternative; boundary=001a11c1e276814fef052d7fddfe

--001a11c1e276814fef052d7fddfe
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Very surprising.. I don't have any ideas other than trying to replicate the
scenario in a test.

Please do keep us posted if you encounter it again and gain more
information.

On Fri, Feb 26, 2016 at 4:34 PM, Sharma Podila <spodila@netflix.com> wrote:

> MESOS-4795 created.
>
> I don't have the exit status. We haven't seen a repeat yet, will catch th=
e
> exit status next time it happens.
>
> Yes, removing the metadata directory was the only way it was resolved.
> This happened on multiple hosts requiring the same resolution.
>
>
> On Thu, Feb 25, 2016 at 6:37 PM, Benjamin Mahler <bmahler@apache.org>
> wrote:
>
>> Feel free to create one. I don't have enough information to know what th=
e
>> issue is without doing some further investigation, but if the situation =
you
>> described is accurate it seems like a there are two strange bugs:
>>
>> -the silent exit (do you not have the exit status?), and
>> -the flapping from ZK errors that needed the meta data directory to be
>> removed to resolve (are you convinced the removal of the meta directory =
is
>> what solved it?)
>>
>> It would be good to track these issues in case they crop up again.
>>
>> On Tue, Feb 23, 2016 at 2:51 PM, Sharma Podila <spodila@netflix.com>
>> wrote:
>>
>>> Hi Ben,
>>>
>>> Let me know if there is a new issue created for this, I would like to
>>> add myself to watch it.
>>> Thanks.
>>>
>>>
>>>
>>> On Wed, Feb 10, 2016 at 9:54 AM, Sharma Podila <spodila@netflix.com>
>>> wrote:
>>>
>>>> Hi Ben,
>>>>
>>>> That is accurate, with one additional line:
>>>>
>>>> -Agent running fine with 0.24.1
>>>> -Transient ZK issues, slave flapping with zookeeper_init failure
>>>> -ZK issue resolved
>>>> -Most agents stop flapping and function correctly
>>>> -Some agents continue flapping, but silent exit after printing the
>>>> detector.cpp:481 log line.
>>>> -The agents that continue to flap repaired with manual removal of
>>>> contents in mesos-slave's working dir
>>>>
>>>>
>>>>
>>>> On Wed, Feb 10, 2016 at 9:43 AM, Benjamin Mahler <bmahler@apache.org>
>>>> wrote:
>>>>
>>>>> Hey Sharma,
>>>>>
>>>>> I didn't quite follow the timeline of events here or how the agent
>>>>> logs you posted fit into the timeline of events. Here's how I interpr=
eted:
>>>>>
>>>>> -Agent running fine with 0.24.1
>>>>> -Transient ZK issues, slave flapping with zookeeper_init failure
>>>>> -ZK issue resolved
>>>>> -Most agents stop flapping and function correctly
>>>>> -Some agents continue flapping, but silent exit after printing the
>>>>> detector.cpp:481 log line.
>>>>>
>>>>> Is this accurate? What is the exit code from the silent exit?
>>>>>
>>>>> On Tue, Feb 9, 2016 at 9:09 PM, Sharma Podila <spodila@netflix.com>
>>>>> wrote:
>>>>>
>>>>>> Maybe related, but, maybe different since a new process seems to fin=
d
>>>>>> the master leader and still aborts, never recovering with restarts u=
ntil
>>>>>> work dir data is removed.
>>>>>> It is happening in 0.24.1.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Feb 9, 2016 at 11:53 AM, Vinod Kone <vinodkone@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> MESOS-1326 was fixed in 0.19.0 (set the fix version now). But I
>>>>>>> guess you are saying it is somehow related but not exactly the same=
 issue?
>>>>>>>
>>>>>>> On Tue, Feb 9, 2016 at 11:46 AM, Ra=C3=BAl Guti=C3=A9rrez Segal=C3=
=A9s <
>>>>>>> rgs@itevenworks.net> wrote:
>>>>>>>
>>>>>>>> On 9 February 2016 at 11:04, Sharma Podila <spodila@netflix.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> We had a few mesos agents stuck in an unrecoverable state after a
>>>>>>>>> transient ZK init error. Is this a known problem? I wasn't able t=
o find an
>>>>>>>>> existing jira item for this. We are on 0.24.1 at this time.
>>>>>>>>>
>>>>>>>>> Most agents were fine, except a handful. These handful of agents
>>>>>>>>> had their mesos-slave process constantly restarting. The .INFO lo=
gfile had
>>>>>>>>> the following contents below, before the process exited, with no =
error
>>>>>>>>> messages. The restarts were happening constantly due to an existi=
ng service
>>>>>>>>> keep alive strategy.
>>>>>>>>>
>>>>>>>>> To fix it, we manually stopped the service, removed the data in
>>>>>>>>> the working dir, and then restarted it. The mesos-slave process w=
as able to
>>>>>>>>> restart then. The manual intervention needed to resolve it is pro=
blematic.
>>>>>>>>>
>>>>>>>>> Here's the contents of the various log files on the agent:
>>>>>>>>>
>>>>>>>>> The .INFO logfile for one of the restarts before mesos-slave
>>>>>>>>> process exited with no other error messages:
>>>>>>>>>
>>>>>>>>> Log file created at: 2016/02/09 02:12:48
>>>>>>>>> Running on machine: titusagent-main-i-7697a9c5
>>>>>>>>> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] m=
sg
>>>>>>>>> I0209 02:12:48.502403 97255 logging.cpp:172] INFO level logging
>>>>>>>>> started!
>>>>>>>>> I0209 02:12:48.502938 97255 main.cpp:185] Build: 2015-09-30
>>>>>>>>> 16:12:07 by builds
>>>>>>>>> I0209 02:12:48.502974 97255 main.cpp:187] Version: 0.24.1
>>>>>>>>> I0209 02:12:48.503288 97255 containerizer.cpp:143] Using
>>>>>>>>> isolation: posix/cpu,posix/mem,filesystem/posix
>>>>>>>>> I0209 02:12:48.507961 97255 main.cpp:272] Starting Mesos slave
>>>>>>>>> I0209 02:12:48.509827 97296 slave.cpp:190] Slave started on 1)@
>>>>>>>>> 10.138.146.230:7101
>>>>>>>>> I0209 02:12:48.510074 97296 slave.cpp:191] Flags at startup:
>>>>>>>>> --appc_store_dir=3D"/tmp/mesos/store/appc"
>>>>>>>>> --attributes=3D"region:us-east-1;<snip>" --authenticatee=3D"<snip=
>"
>>>>>>>>> --cgroups_cpu_enable_pids_and_tids_count=3D"false"
>>>>>>>>> --cgroups_enable_cfs=3D"false" --cgroups_hierarchy=3D"/sys/fs/cgr=
oup"
>>>>>>>>> --cgroups_limit_swap=3D"false" --cgroups_root=3D"mesos"
>>>>>>>>> --container_disk_watch_interval=3D"15secs" --containerizers=3D"me=
sos" <snip>"
>>>>>>>>> I0209 02:12:48.511706 97296 slave.cpp:354] Slave resources:
>>>>>>>>> ports(*):[7150-7200]; mem(*):240135; cpus(*):32; disk(*):586104
>>>>>>>>> I0209 02:12:48.512320 97296 slave.cpp:384] Slave hostname: <snip>
>>>>>>>>> I0209 02:12:48.512368 97296 slave.cpp:389] Slave checkpoint: true
>>>>>>>>> I0209 02:12:48.516139 97299 group.cpp:331] Group process (group(1=
)@
>>>>>>>>> 10.138.146.230:7101) connected to ZooKeeper
>>>>>>>>> I0209 02:12:48.516216 97299 group.cpp:805] Syncing group
>>>>>>>>> operations: queue size (joins, cancels, datas) =3D (0, 0, 0)
>>>>>>>>> I0209 02:12:48.516253 97299 group.cpp:403] Trying to create path
>>>>>>>>> '/titus/main/mesos' in ZooKeeper
>>>>>>>>> I0209 02:12:48.520268 97275 detector.cpp:156] Detected a new
>>>>>>>>> leader: (id=3D'209')
>>>>>>>>> I0209 02:12:48.520803 97284 group.cpp:674] Trying to get
>>>>>>>>> '/titus/main/mesos/json.info_0000000209' in ZooKeeper
>>>>>>>>> I0209 02:12:48.520874 97278 state.cpp:54] Recovering state from
>>>>>>>>> '/mnt/data/mesos/meta'
>>>>>>>>> I0209 02:12:48.520961 97278 state.cpp:690] Failed to find
>>>>>>>>> resources file '/mnt/data/mesos/meta/resources/resources.info'
>>>>>>>>> I0209 02:12:48.523680 97283 detector.cpp:481] A new leading maste=
r
>>>>>>>>> (UPID=3Dmaster@10.230.95.110:7103) is detected
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The .FATAL log file when the original transient ZK error occurred=
:
>>>>>>>>>
>>>>>>>>> Log file created at: 2016/02/05 17:21:37
>>>>>>>>> Running on machine: titusagent-main-i-7697a9c5
>>>>>>>>> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] m=
sg
>>>>>>>>> F0205 17:21:37.395644 53841 zookeeper.cpp:110] Failed to create
>>>>>>>>> ZooKeeper, zookeeper_init: No such file or directory [2]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The .ERROR log file:
>>>>>>>>>
>>>>>>>>> Log file created at: 2016/02/05 17:21:37
>>>>>>>>> Running on machine: titusagent-main-i-7697a9c5
>>>>>>>>> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] m=
sg
>>>>>>>>> F0205 17:21:37.395644 53841 zookeeper.cpp:110] Failed to create
>>>>>>>>> ZooKeeper, zookeeper_init: No such file or directory [2]
>>>>>>>>>
>>>>>>>>> The .WARNING file had the same content.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Maybe related: https://issues.apache.org/jira/browse/MESOS-1326
>>>>>>>>
>>>>>>>>
>>>>>>>> -rgs
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

--001a11c1e276814fef052d7fddfe
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Very surprising.. I don&#39;t have any ideas other than tr=
ying to replicate the scenario in a test.<div><br></div><div>Please do keep=
 us posted if you encounter it again and gain more information.<br></div></=
div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Fri, Feb 2=
6, 2016 at 4:34 PM, Sharma Podila <span dir=3D"ltr">&lt;<a href=3D"mailto:s=
podila@netflix.com" target=3D"_blank">spodila@netflix.com</a>&gt;</span> wr=
ote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border=
-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div class=3D"gmail=
_default"><font face=3D"times new roman, serif">MESOS-4795 created.</font><=
br></div><div class=3D"gmail_default" style=3D"font-family:&#39;times new r=
oman&#39;,serif"><br></div><div class=3D"gmail_default" style=3D"font-famil=
y:&#39;times new roman&#39;,serif">I don&#39;t have the exit status. We hav=
en&#39;t seen a repeat yet, will catch the exit status next time it happens=
.</div><div class=3D"gmail_default" style=3D"font-family:&#39;times new rom=
an&#39;,serif"><br></div><div class=3D"gmail_default" style=3D"font-family:=
&#39;times new roman&#39;,serif">Yes, removing the metadata directory was t=
he only way it was resolved. This happened on multiple hosts requiring the =
same resolution.=C2=A0</div><div><div class=3D"h5"><div class=3D"gmail_defa=
ult" style=3D"font-family:&#39;times new roman&#39;,serif"><br></div><div c=
lass=3D"gmail_extra"><br><div class=3D"gmail_quote">On Thu, Feb 25, 2016 at=
 6:37 PM, Benjamin Mahler <span dir=3D"ltr">&lt;<a href=3D"mailto:bmahler@a=
pache.org" target=3D"_blank">bmahler@apache.org</a>&gt;</span> wrote:<br><b=
lockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-le=
ft-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;pad=
ding-left:1ex"><div dir=3D"ltr"><div>Feel free to create one. I don&#39;t h=
ave enough information to know what the issue is without doing some further=
 investigation, but if the situation you described is accurate it seems lik=
e a there are two strange bugs:<br></div><div><br></div><div>-the silent ex=
it (do you not have the exit status?), and</div><div>-the flapping from ZK =
errors that needed the meta data directory to be removed to resolve (are yo=
u convinced the removal of the meta directory is what solved it?)</div><div=
><br></div><div>It would be good to track these issues in case they crop up=
 again.</div></div><div><div><div class=3D"gmail_extra"><br><div class=3D"g=
mail_quote">On Tue, Feb 23, 2016 at 2:51 PM, Sharma Podila <span dir=3D"ltr=
">&lt;<a href=3D"mailto:spodila@netflix.com" target=3D"_blank">spodila@netf=
lix.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D=
"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,2=
04,204);border-left-style:solid;padding-left:1ex"><div dir=3D"ltr"><div cla=
ss=3D"gmail_default" style=3D"font-family:&#39;times new roman&#39;,serif">=
Hi Ben,=C2=A0</div><div class=3D"gmail_default" style=3D"font-family:&#39;t=
imes new roman&#39;,serif"><br></div><div class=3D"gmail_default" style=3D"=
font-family:&#39;times new roman&#39;,serif">Let me know if there is a new =
issue created for this, I would like to add myself to watch it.=C2=A0</div>=
<div class=3D"gmail_default" style=3D"font-family:&#39;times new roman&#39;=
,serif">Thanks.</div><div class=3D"gmail_default" style=3D"font-family:&#39=
;times new roman&#39;,serif"><br></div><div class=3D"gmail_default" style=
=3D"font-family:&#39;times new roman&#39;,serif"><br></div></div><div><div>=
<div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Wed, Feb 10, 2=
016 at 9:54 AM, Sharma Podila <span dir=3D"ltr">&lt;<a href=3D"mailto:spodi=
la@netflix.com" target=3D"_blank">spodila@netflix.com</a>&gt;</span> wrote:=
<br><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bor=
der-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:sol=
id;padding-left:1ex"><div dir=3D"ltr"><div class=3D"gmail_default" style=3D=
"font-family:&#39;times new roman&#39;,serif">Hi Ben,=C2=A0</div><div class=
=3D"gmail_default" style=3D"font-family:&#39;times new roman&#39;,serif"><b=
r></div><div class=3D"gmail_default" style=3D"font-family:&#39;times new ro=
man&#39;,serif">That is accurate, with one additional line:</div><div class=
=3D"gmail_default" style=3D"font-family:&#39;times new roman&#39;,serif"><b=
r></div><div class=3D"gmail_default" style=3D"font-family:&#39;times new ro=
man&#39;,serif"><span><div class=3D"gmail_default">-Agent running fine with=
 0.24.1</div><div class=3D"gmail_default">-Transient ZK issues, slave flapp=
ing with zookeeper_init failure</div><div class=3D"gmail_default">-ZK issue=
 resolved</div><div class=3D"gmail_default">-Most agents stop flapping and =
function correctly</div><div class=3D"gmail_default">-Some agents continue =
flapping, but silent exit after printing the detector.cpp:481 log line.</di=
v></span><div>-The agents that continue to flap repaired with manual remova=
l of contents in mesos-slave&#39;s working dir</div><div><br></div></div><d=
iv class=3D"gmail_default" style=3D"font-family:&#39;times new roman&#39;,s=
erif"><br></div></div><div><div><div class=3D"gmail_extra"><br><div class=
=3D"gmail_quote">On Wed, Feb 10, 2016 at 9:43 AM, Benjamin Mahler <span dir=
=3D"ltr">&lt;<a href=3D"mailto:bmahler@apache.org" target=3D"_blank">bmahle=
r@apache.org</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" sty=
le=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(=
204,204,204);border-left-style:solid;padding-left:1ex"><div dir=3D"ltr">Hey=
 Sharma,<div><br></div><div>I didn&#39;t quite follow the timeline of event=
s here or how the agent logs you posted fit into the timeline of events. He=
re&#39;s how I interpreted:<br></div><div><br></div><div>-Agent running fin=
e with 0.24.1</div><div>-Transient ZK issues, slave flapping with zookeeper=
_init failure</div><div>-ZK issue resolved</div><div>-Most agents stop flap=
ping and function correctly</div><div>-Some agents continue flapping, but s=
ilent exit after printing the detector.cpp:481 log line.</div><div><br></di=
v><div>Is this accurate? What is the exit code from the silent exit?<br></d=
iv></div><div><div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote=
">On Tue, Feb 9, 2016 at 9:09 PM, Sharma Podila <span dir=3D"ltr">&lt;<a hr=
ef=3D"mailto:spodila@netflix.com" target=3D"_blank">spodila@netflix.com</a>=
&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0px=
 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);bor=
der-left-style:solid;padding-left:1ex"><div dir=3D"ltr"><div class=3D"gmail=
_default" style=3D"font-family:&#39;times new roman&#39;,serif">Maybe relat=
ed, but, maybe different since a new process seems to find the master leade=
r and still aborts, never recovering with restarts until work dir data is r=
emoved.=C2=A0</div><div class=3D"gmail_default" style=3D"font-family:&#39;t=
imes new roman&#39;,serif">It is happening in 0.24.1.=C2=A0</div><div class=
=3D"gmail_default" style=3D"font-family:&#39;times new roman&#39;,serif"><b=
r></div><div class=3D"gmail_default" style=3D"font-family:&#39;times new ro=
man&#39;,serif"><br></div><div class=3D"gmail_default" style=3D"font-family=
:&#39;times new roman&#39;,serif"><br></div></div><div><div><div class=3D"g=
mail_extra"><br><div class=3D"gmail_quote">On Tue, Feb 9, 2016 at 11:53 AM,=
 Vinod Kone <span dir=3D"ltr">&lt;<a href=3D"mailto:vinodkone@apache.org" t=
arget=3D"_blank">vinodkone@apache.org</a>&gt;</span> wrote:<br><blockquote =
class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1=
px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:=
1ex"><div dir=3D"ltr">MESOS-1326 was fixed in 0.19.0 (set the fix version n=
ow). But I guess you are saying it is somehow related but not exactly the s=
ame issue?</div><div><div><div class=3D"gmail_extra"><br><div class=3D"gmai=
l_quote">On Tue, Feb 9, 2016 at 11:46 AM, Ra=C3=BAl Guti=C3=A9rrez Segal=C3=
=A9s <span dir=3D"ltr">&lt;<a href=3D"mailto:rgs@itevenworks.net" target=3D=
"_blank">rgs@itevenworks.net</a>&gt;</span> wrote:<br><blockquote class=3D"=
gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border=
-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div=
 dir=3D"ltr"><div><div>On 9 February 2016 at 11:04, Sharma Podila <span dir=
=3D"ltr">&lt;<a href=3D"mailto:spodila@netflix.com" target=3D"_blank">spodi=
la@netflix.com</a>&gt;</span> wrote:<br></div></div><div class=3D"gmail_ext=
ra"><div class=3D"gmail_quote"><div><div><blockquote class=3D"gmail_quote" =
style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:s=
olid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr">=
<div style=3D"font-family:&#39;times new roman&#39;,serif">We had a few mes=
os agents stuck in an unrecoverable state after a transient ZK init error. =
Is this a known problem? I wasn&#39;t able to find an existing jira item fo=
r this. We are on 0.24.1 at this time.</div><div style=3D"font-family:&#39;=
times new roman&#39;,serif"><br></div><div style=3D"font-family:&#39;times =
new roman&#39;,serif">Most agents were fine, except a handful. These handfu=
l of agents had their mesos-slave process constantly restarting. The .INFO =
logfile had the following contents below, before the process exited, with n=
o error messages. The restarts were happening constantly due to an existing=
 service keep alive strategy.</div><div style=3D"font-family:&#39;times new=
 roman&#39;,serif"><br></div><div style=3D"font-family:&#39;times new roman=
&#39;,serif">To fix it, we manually stopped the service, removed the data i=
n the working dir, and then restarted it. The mesos-slave process was able =
to restart then. The manual intervention needed to resolve it is problemati=
c.</div><div style=3D"font-family:&#39;times new roman&#39;,serif"><br></di=
v><div style=3D"font-family:&#39;times new roman&#39;,serif">Here&#39;s the=
 contents of the various log files on the agent:</div><div style=3D"font-fa=
mily:&#39;times new roman&#39;,serif"><br></div><div style=3D"font-family:&=
#39;times new roman&#39;,serif">The .INFO logfile for one of the restarts b=
efore mesos-slave process exited with no other error messages:</div><div st=
yle=3D"font-family:&#39;times new roman&#39;,serif"><br></div><div><div sty=
le=3D"font-family:&#39;times new roman&#39;,serif">Log file created at: 201=
6/02/09 02:12:48</div><div style=3D"font-family:&#39;times new roman&#39;,s=
erif">Running on machine: titusagent-main-i-7697a9c5</div><div style=3D"fon=
t-family:&#39;times new roman&#39;,serif">Log line format: [IWEF]mmdd hh:mm=
:ss.uuuuuu threadid file:line] msg</div><div style=3D"font-family:&#39;time=
s new roman&#39;,serif">I0209 02:12:48.502403 97255 logging.cpp:172] INFO l=
evel logging started!</div><div style=3D"font-family:&#39;times new roman&#=
39;,serif">I0209 02:12:48.502938 97255 main.cpp:185] Build: 2015-09-30 16:1=
2:07 by builds</div><div style=3D"font-family:&#39;times new roman&#39;,ser=
if">I0209 02:12:48.502974 97255 main.cpp:187] Version: 0.24.1</div><div sty=
le=3D"font-family:&#39;times new roman&#39;,serif">I0209 02:12:48.503288 97=
255 containerizer.cpp:143] Using isolation: posix/cpu,posix/mem,filesystem/=
posix</div><div style=3D"font-family:&#39;times new roman&#39;,serif">I0209=
 02:12:48.507961 97255 main.cpp:272] Starting Mesos slave</div><div style=
=3D"font-family:&#39;times new roman&#39;,serif">I0209 02:12:48.509827 9729=
6 slave.cpp:190] Slave started on 1)@<a href=3D"http://10.138.146.230:7101"=
 target=3D"_blank">10.138.146.230:7101</a></div><div style=3D"font-family:&=
#39;times new roman&#39;,serif">I0209 02:12:48.510074 97296 slave.cpp:191] =
Flags at startup: --appc_store_dir=3D&quot;/tmp/mesos/store/appc&quot; --at=
tributes=3D&quot;region:us-east-1;&lt;snip&gt;&quot; --authenticatee=3D&quo=
t;&lt;snip&gt;&quot; --cgroups_cpu_enable_pids_and_tids_count=3D&quot;false=
&quot; --cgroups_enable_cfs=3D&quot;false&quot; --cgroups_hierarchy=3D&quot=
;/sys/fs/cgroup&quot; --cgroups_limit_swap=3D&quot;false&quot; --cgroups_ro=
ot=3D&quot;mesos&quot; --container_disk_watch_interval=3D&quot;15secs&quot;=
 --containerizers=3D&quot;mesos&quot; &lt;snip&gt;&quot;</div><div style=3D=
"font-family:&#39;times new roman&#39;,serif">I0209 02:12:48.511706 97296 s=
lave.cpp:354] Slave resources: ports(*):[7150-7200]; mem(*):240135; cpus(*)=
:32; disk(*):586104</div><div style=3D"font-family:&#39;times new roman&#39=
;,serif">I0209 02:12:48.512320 97296 slave.cpp:384] Slave hostname: &lt;sni=
p&gt;</div><div style=3D"font-family:&#39;times new roman&#39;,serif">I0209=
 02:12:48.512368 97296 slave.cpp:389] Slave checkpoint: true</div><div styl=
e=3D"font-family:&#39;times new roman&#39;,serif">I0209 02:12:48.516139 972=
99 group.cpp:331] Group process (group(1)@<a href=3D"http://10.138.146.230:=
7101" target=3D"_blank">10.138.146.230:7101</a>) connected to ZooKeeper</di=
v><div style=3D"font-family:&#39;times new roman&#39;,serif">I0209 02:12:48=
.516216 97299 group.cpp:805] Syncing group operations: queue size (joins, c=
ancels, datas) =3D (0, 0, 0)</div><div style=3D"font-family:&#39;times new =
roman&#39;,serif">I0209 02:12:48.516253 97299 group.cpp:403] Trying to crea=
te path &#39;/titus/main/mesos&#39; in ZooKeeper</div><div style=3D"font-fa=
mily:&#39;times new roman&#39;,serif">I0209 02:12:48.520268 97275 detector.=
cpp:156] Detected a new leader: (id=3D&#39;209&#39;)</div><div style=3D"fon=
t-family:&#39;times new roman&#39;,serif">I0209 02:12:48.520803 97284 group=
.cpp:674] Trying to get &#39;/titus/main/mesos/json.info_0000000209&#39; in=
 ZooKeeper</div><div style=3D"font-family:&#39;times new roman&#39;,serif">=
I0209 02:12:48.520874 97278 state.cpp:54] Recovering state from &#39;/mnt/d=
ata/mesos/meta&#39;</div><div style=3D"font-family:&#39;times new roman&#39=
;,serif">I0209 02:12:48.520961 97278 state.cpp:690] Failed to find resource=
s file &#39;/mnt/data/mesos/meta/resources/<a href=3D"http://resources.info=
" target=3D"_blank">resources.info</a>&#39;</div><div style=3D"font-family:=
&#39;times new roman&#39;,serif">I0209 02:12:48.523680 97283 detector.cpp:4=
81] A new leading master (UPID=3D<a href=3D"http://master@10.230.95.110:710=
3" target=3D"_blank">master@10.230.95.110:7103</a>) is detected</div><div s=
tyle=3D"font-family:&#39;times new roman&#39;,serif"><br></div><div style=
=3D"font-family:&#39;times new roman&#39;,serif"><br></div><div style=3D"fo=
nt-family:&#39;times new roman&#39;,serif">The .FATAL log file when the ori=
ginal transient ZK error occurred:</div><div style=3D"font-family:&#39;time=
s new roman&#39;,serif"><br></div><div><div><font face=3D"times new roman, =
serif">Log file created at: 2016/02/05 17:21:37</font></div><div><font face=
=3D"times new roman, serif">Running on machine: titusagent-main-i-7697a9c5<=
/font></div><div><font face=3D"times new roman, serif">Log line format: [IW=
EF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg</font></div><div><font face=
=3D"times new roman, serif">F0205 17:21:37.395644 53841 zookeeper.cpp:110] =
Failed to create ZooKeeper, zookeeper_init: No such file or directory [2]</=
font></div></div><div style=3D"font-family:&#39;times new roman&#39;,serif"=
><br></div><div style=3D"font-family:&#39;times new roman&#39;,serif"><br><=
/div><div style=3D"font-family:&#39;times new roman&#39;,serif">The .ERROR =
log file:</div><div style=3D"font-family:&#39;times new roman&#39;,serif"><=
br></div><div style=3D"font-family:&#39;times new roman&#39;,serif"><div>Lo=
g file created at: 2016/02/05 17:21:37</div><div>Running on machine: titusa=
gent-main-i-7697a9c5</div><div>Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu =
threadid file:line] msg</div><div>F0205 17:21:37.395644 53841 zookeeper.cpp=
:110] Failed to create ZooKeeper, zookeeper_init: No such file or directory=
 [2]</div><div><br></div><div>The .WARNING file had the same content.=C2=A0=
</div></div></div></div></blockquote><div><br></div></div></div><div>Maybe =
related: <a href=3D"https://issues.apache.org/jira/browse/MESOS-1326" targe=
t=3D"_blank">https://issues.apache.org/jira/browse/MESOS-1326</a><span><fon=
t color=3D"#888888"><br><br><br></font></span></div><span><font color=3D"#8=
88888"><div>-rgs <br></div></font></span></div><br></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div></div></div></div>
</blockquote></div><br></div>

--001a11c1e276814fef052d7fddfe--