Return-Path: X-Original-To: apmail-incubator-mesos-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-mesos-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4994710D37 for ; Wed, 3 Jul 2013 15:10:51 +0000 (UTC) Received: (qmail 34590 invoked by uid 500); 3 Jul 2013 15:10:51 -0000 Delivered-To: apmail-incubator-mesos-dev-archive@incubator.apache.org Received: (qmail 34564 invoked by uid 500); 3 Jul 2013 15:10:51 -0000 Mailing-List: contact mesos-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mesos-dev@incubator.apache.org Delivered-To: mailing list mesos-dev@incubator.apache.org Received: (qmail 34556 invoked by uid 99); 3 Jul 2013 15:10:51 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Jul 2013 15:10:51 +0000 X-ASF-Spam-Status: No, hits=1.8 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of wanggd04@gmail.com designates 209.85.215.42 as permitted sender) Received: from [209.85.215.42] (HELO mail-la0-f42.google.com) (209.85.215.42) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Jul 2013 15:10:44 +0000 Received: by mail-la0-f42.google.com with SMTP id eb20so253769lab.15 for ; Wed, 03 Jul 2013 08:10:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=1eibApbsBkXlfef4Pw5sYZzBbe5/GKeTabPZC333nrM=; b=OPEZatVOU20tolLNzXbZCBtysTBn1Zi4/XunIz1oWggRsVGBQVy3zJD1Vk64y+hOpG gp7MsynugtROIr144N8eDV06CmSkfmjbQQoE21ZiA6lZfn30PfkPcUfoxLrv5fCDFgOn 93pT5uHPu5VY+6clDzC8Fa7lvG7Z+DlgpzG602BafcJeOEhSpB6l6uGptSTXijjRNcLi RnVqdGzL0QvmV65gfm6NS4EUwsETaZ8avSuOC1sSk5UaosSZIXwF0iBrm9+pYK7fSUO2 pL8lOQjzv5wj2EA4yzWmNPfNYfQzOJujgzIb38yoZjGLbLljWwXmdIV5915L29qBxepl M6zA== X-Received: by 10.152.27.137 with SMTP id t9mr712448lag.28.1372864224392; Wed, 03 Jul 2013 08:10:24 -0700 (PDT) MIME-Version: 1.0 Received: by 10.114.67.166 with HTTP; Wed, 3 Jul 2013 08:10:04 -0700 (PDT) In-Reply-To: References: From: =?UTF-8?B?546L5Zu95qCL?= Date: Wed, 3 Jul 2013 23:10:04 +0800 Message-ID: Subject: Re: Slave crashes when restarting To: mesos-dev Content-Type: multipart/alternative; boundary=089e0160bd0e6b7d5d04e09cdcfc X-Virus-Checked: Checked by ClamAV on apache.org --089e0160bd0e6b7d5d04e09cdcfc Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Yes, it works. Thanks for your remind! Guodong On Wed, Jul 3, 2013 at 10:04 PM, Benjamin Hindman w= rote: > Hi Guodong, > > We updated configure.ac, so you'll need to re-run './bootstrap' in the > top-level directory. Let us know if that fixes the problem for you. > > Ben. > > > On Wed, Jul 3, 2013 at 4:08 AM, =E7=8E=8B=E5=9B=BD=E6=A0=8B wrote: > > > I pulled the latest code from the trunk. The build fails. > > > > *config.status: error: cannot find input file: `bin/ > mesos-build-env.sh.in > > '* > > > > it seems that a file is missing in git repo. > > > > > > Guodong > > > > > > On Wed, Jul 3, 2013 at 3:50 PM, =E7=8E=8B=E5=9B=BD=E6=A0=8B wrote: > > > > > OK, Thanks Vinod. I will try it. > > > > > > Guodong > > > > > > > > > On Wed, Jul 3, 2013 at 12:31 PM, Vinod Kone > wrote: > > > > > >> I think this was recently fixed. Can you try building from the lates= t > > >> "master"? > > >> > > >> > > >> On Tue, Jul 2, 2013 at 8:05 PM, =E7=8E=8B=E5=9B=BD=E6=A0=8B wrote: > > >> > > >> > I am doing some failover test about mesos nowadays. > > >> > > > >> > The code I am using is pulled from git master. And in the followin= g > > >> case, I > > >> > find that slave may crash from time to time. > > >> > > > >> > Reproduce process > > >> > 1. start mesos cluster > > >> > 2. start hadoop jobtracker, then jobtracker will register to mesos > > >> > 3. submit some hadoop jobs, and keep them running. > > >> > 4. kill all the mesos master and slave > > >> > 5. restart mesos cluster > > >> > > > >> > Then, after slave is restarted. Sometimes, some slave will crashes= . > I > > >> got > > >> > the log of the slave. Hoping it will help. > > >> > > > >> > I0702 19:03:32.684700 24900 slave.cpp:2510] Current usage 71.33%. > Max > > >> > allowed age: 1.306860088778333days > > >> > 2013-07-02 > 19:03:33,174:24890(0x41057940):ZOO_WARN@zookeeper_interest > > >> > @1461: > > >> > Exceeded deadline by 28ms > > >> > 2013-07-02 > 19:03:33,180:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > >> > @1983: > > >> > Got ping response in 5 ms > > >> > 2013-07-02 > 19:03:36,565:24890(0x41057940):ZOO_WARN@zookeeper_interest > > >> > @1461: > > >> > Exceeded deadline by 57ms > > >> > 2013-07-02 > 19:03:36,566:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > >> > @1983: > > >> > Got ping response in 0 ms > > >> > 2013-07-02 > 19:03:39,906:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > >> > @1983: > > >> > Got ping response in 6 ms > > >> > 2013-07-02 > 19:03:43,245:24890(0x41057940):ZOO_WARN@zookeeper_interest > > >> > @1461: > > >> > Exceeded deadline by 12ms > > >> > 2013-07-02 > 19:03:43,292:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > >> > @1983: > > >> > Got ping response in 46 ms > > >> > 2013-07-02 > 19:03:46,588:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > >> > @1983: > > >> > Got ping response in 9 ms > > >> > 2013-07-02 > 19:03:49,913:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > >> > @1983: > > >> > Got ping response in 0 ms > > >> > 2013-07-02 > 19:03:53,277:24890(0x41057940):ZOO_WARN@zookeeper_interest > > >> > @1461: > > >> > Exceeded deadline by 31ms > > >> > 2013-07-02 > 19:03:53,293:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > >> > @1983: > > >> > Got ping response in 15 ms > > >> > 2013-07-02 > 19:03:56,611:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > >> > @1983: > > >> > Got ping response in 0 ms > > >> > 2013-07-02 > 19:03:59,967:24890(0x41057940):ZOO_WARN@zookeeper_interest > > >> > @1461: > > >> > Exceeded deadline by 22ms > > >> > 2013-07-02 > 19:03:59,968:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > >> > @1983: > > >> > Got ping response in 0 ms > > >> > 2013-07-02 > 19:04:03,335:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > >> > @1983: > > >> > Got ping response in 33 ms > > >> > 2013-07-02 > 19:04:06,672:24890(0x41057940):ZOO_WARN@zookeeper_interest > > >> > @1461: > > >> > Exceeded deadline by 36ms > > >> > 2013-07-02 > 19:04:06,691:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > >> > @1983: > > >> > Got ping response in 18 ms > > >> > 2013-07-02 > 19:04:10,012:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > >> > @1983: > > >> > Got ping response in 6 ms > > >> > 2013-07-02 > 19:04:13,344:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > >> > @1983: > > >> > Got ping response in 3 ms > > >> > 2013-07-02 > 19:04:16,707:24890(0x41057940):ZOO_WARN@zookeeper_interest > > >> > @1461: > > >> > Exceeded deadline by 32ms > > >> > 2013-07-02 > 19:04:16,737:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > >> > @1983: > > >> > Got ping response in 30 ms > > >> > 2013-07-02 > 19:04:20,057:24890(0x41057940):ZOO_WARN@zookeeper_interest > > >> > @1461: > > >> > Exceeded deadline by 16ms > > >> > 2013-07-02 > 19:04:20,067:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > >> > @1983: > > >> > Got ping response in 10 ms > > >> > 2013-07-02 > 19:04:23,410:24890(0x41057940):ZOO_WARN@zookeeper_interest > > >> > @1461: > > >> > Exceeded deadline by 19ms > > >> > 2013-07-02 > 19:04:23,411:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > >> > @1983: > > >> > Got ping response in 1 ms > > >> > 2013-07-02 > 19:04:26,820:24890(0x41057940):ZOO_WARN@zookeeper_interest > > >> > @1461: > > >> > Exceeded deadline by 77ms > > >> > 2013-07-02 > 19:04:26,919:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > >> > @1983: > > >> > Got ping response in 98 ms > > >> > 2013-07-02 > 19:04:30,163:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > >> > @1983: > > >> > Got ping response in 0 ms > > >> > I0702 19:04:32.685693 24892 slave.cpp:2510] Current usage 71.33%. > Max > > >> > allowed age: 1.306755345349155days > > >> > 2013-07-02 > 19:04:33,514:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > >> > @1983: > > >> > Got ping response in 17 ms > > >> > 2013-07-02 > 19:04:36,832:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > >> > @1983: > > >> > Got ping response in 1 ms > > >> > 2013-07-02 > 19:04:40,164:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > >> > @1983: > > >> > Got ping response in 0 ms > > >> > 2013-07-02 > 19:04:43,498:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > >> > @1983: > > >> > Got ping response in 0 ms > > >> > 2013-07-02 > 19:04:46,878:24890(0x41057940):ZOO_WARN@zookeeper_interest > > >> > @1461: > > >> > Exceeded deadline by 46ms > > >> > 2013-07-02 > 19:04:46,880:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > >> > @1983: > > >> > Got ping response in 1 ms > > >> > 2013-07-02 > 19:04:50,282:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > >> > @1983: > > >> > Got ping response in 71 ms > > >> > 2013-07-02 > 19:04:53,565:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > >> > @1983: > > >> > Got ping response in 19 ms > > >> > Result::get() but state =3D=3D NONE > > >> > *** Aborted at 1372763096 (unix time) try "date -d @1372763096" if > you > > >> are > > >> > using GNU date *** > > >> > PC: @ 0x3d87a30215 (unknown) > > >> > *** SIGABRT (@0x613a) received by PID 24890 (TID 0x4878f940) from > PID > > >> > 24890; stack trace: *** > > >> > @ 0x3d8860e4c0 (unknown) > > >> > @ 0x3d87a30215 (unknown) > > >> > @ 0x3d87a31cc0 (unknown) > > >> > @ 0x2b02c1bf96e5 > > >> mesos::internal::slave::ProcessIsolator::usage() > > >> > @ 0x2b02c1b59a30 std::tr1::_Function_handler<>::_M_invoke(= ) > > >> > @ 0x2b02c1b5a361 std::tr1::function<>::operator()() > > >> > @ 0x2b02c1b63f2b process::internal::pdispatcher<>() > > >> > @ 0x2b02c1b5c45e std::tr1::_Function_handler<>::_M_invoke(= ) > > >> > @ 0x2b02c1dbf205 process::ProcessManager::resume() > > >> > @ 0x2b02c1dbfbbf process::schedule() > > >> > @ 0x3d88606367 (unknown) > > >> > @ 0x3d87ad30ad (unknown) > > >> > > > >> > > > >> > > > >> > > > >> > Guodong > > >> > > > >> > > > > > > > > > --089e0160bd0e6b7d5d04e09cdcfc--