mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 王国栋 <wangg...@gmail.com>
Subject Re: Slave crashes when restarting
Date Wed, 03 Jul 2013 15:10:04 GMT
Yes, it works. Thanks for your remind!

Guodong


On Wed, Jul 3, 2013 at 10:04 PM, Benjamin Hindman <benh@eecs.berkeley.edu>wrote:

> Hi Guodong,
>
> We updated configure.ac, so you'll need to re-run './bootstrap' in the
> top-level directory. Let us know if that fixes the problem for you.
>
> Ben.
>
>
> On Wed, Jul 3, 2013 at 4:08 AM, 王国栋 <wanggd04@gmail.com> wrote:
>
> > I pulled the latest code from the trunk. The build fails.
> >
> > *config.status: error: cannot find input file: `bin/
> mesos-build-env.sh.in
> > '*
> >
> > it seems that a file is missing in git repo.
> >
> >
> > Guodong
> >
> >
> > On Wed, Jul 3, 2013 at 3:50 PM, 王国栋 <wanggd04@gmail.com> wrote:
> >
> > > OK, Thanks Vinod. I will try it.
> > >
> > > Guodong
> > >
> > >
> > > On Wed, Jul 3, 2013 at 12:31 PM, Vinod Kone <vinodkone@gmail.com>
> wrote:
> > >
> > >> I think this was recently fixed. Can you try building from the latest
> > >> "master"?
> > >>
> > >>
> > >> On Tue, Jul 2, 2013 at 8:05 PM, 王国栋 <wanggd04@gmail.com> wrote:
> > >>
> > >> > I am doing some failover test about mesos nowadays.
> > >> >
> > >> > The code I am using is pulled from git master. And in the following
> > >> case, I
> > >> > find that slave may crash from time to time.
> > >> >
> > >> > Reproduce process
> > >> > 1. start mesos cluster
> > >> > 2. start hadoop jobtracker, then jobtracker will register to mesos
> > >> > 3. submit some hadoop jobs, and keep them running.
> > >> > 4. kill all the mesos master and slave
> > >> > 5. restart mesos cluster
> > >> >
> > >> > Then, after slave is restarted. Sometimes, some slave will crashes.
> I
> > >> got
> > >> > the log of the slave. Hoping it will help.
> > >> >
> > >> > I0702 19:03:32.684700 24900 slave.cpp:2510] Current usage 71.33%.
> Max
> > >> > allowed age: 1.306860088778333days
> > >> > 2013-07-02
> 19:03:33,174:24890(0x41057940):ZOO_WARN@zookeeper_interest
> > >> > @1461:
> > >> > Exceeded deadline by 28ms
> > >> > 2013-07-02
> 19:03:33,180:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> > >> > @1983:
> > >> > Got ping response in 5 ms
> > >> > 2013-07-02
> 19:03:36,565:24890(0x41057940):ZOO_WARN@zookeeper_interest
> > >> > @1461:
> > >> > Exceeded deadline by 57ms
> > >> > 2013-07-02
> 19:03:36,566:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> > >> > @1983:
> > >> > Got ping response in 0 ms
> > >> > 2013-07-02
> 19:03:39,906:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> > >> > @1983:
> > >> > Got ping response in 6 ms
> > >> > 2013-07-02
> 19:03:43,245:24890(0x41057940):ZOO_WARN@zookeeper_interest
> > >> > @1461:
> > >> > Exceeded deadline by 12ms
> > >> > 2013-07-02
> 19:03:43,292:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> > >> > @1983:
> > >> > Got ping response in 46 ms
> > >> > 2013-07-02
> 19:03:46,588:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> > >> > @1983:
> > >> > Got ping response in 9 ms
> > >> > 2013-07-02
> 19:03:49,913:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> > >> > @1983:
> > >> > Got ping response in 0 ms
> > >> > 2013-07-02
> 19:03:53,277:24890(0x41057940):ZOO_WARN@zookeeper_interest
> > >> > @1461:
> > >> > Exceeded deadline by 31ms
> > >> > 2013-07-02
> 19:03:53,293:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> > >> > @1983:
> > >> > Got ping response in 15 ms
> > >> > 2013-07-02
> 19:03:56,611:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> > >> > @1983:
> > >> > Got ping response in 0 ms
> > >> > 2013-07-02
> 19:03:59,967:24890(0x41057940):ZOO_WARN@zookeeper_interest
> > >> > @1461:
> > >> > Exceeded deadline by 22ms
> > >> > 2013-07-02
> 19:03:59,968:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> > >> > @1983:
> > >> > Got ping response in 0 ms
> > >> > 2013-07-02
> 19:04:03,335:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> > >> > @1983:
> > >> > Got ping response in 33 ms
> > >> > 2013-07-02
> 19:04:06,672:24890(0x41057940):ZOO_WARN@zookeeper_interest
> > >> > @1461:
> > >> > Exceeded deadline by 36ms
> > >> > 2013-07-02
> 19:04:06,691:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> > >> > @1983:
> > >> > Got ping response in 18 ms
> > >> > 2013-07-02
> 19:04:10,012:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> > >> > @1983:
> > >> > Got ping response in 6 ms
> > >> > 2013-07-02
> 19:04:13,344:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> > >> > @1983:
> > >> > Got ping response in 3 ms
> > >> > 2013-07-02
> 19:04:16,707:24890(0x41057940):ZOO_WARN@zookeeper_interest
> > >> > @1461:
> > >> > Exceeded deadline by 32ms
> > >> > 2013-07-02
> 19:04:16,737:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> > >> > @1983:
> > >> > Got ping response in 30 ms
> > >> > 2013-07-02
> 19:04:20,057:24890(0x41057940):ZOO_WARN@zookeeper_interest
> > >> > @1461:
> > >> > Exceeded deadline by 16ms
> > >> > 2013-07-02
> 19:04:20,067:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> > >> > @1983:
> > >> > Got ping response in 10 ms
> > >> > 2013-07-02
> 19:04:23,410:24890(0x41057940):ZOO_WARN@zookeeper_interest
> > >> > @1461:
> > >> > Exceeded deadline by 19ms
> > >> > 2013-07-02
> 19:04:23,411:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> > >> > @1983:
> > >> > Got ping response in 1 ms
> > >> > 2013-07-02
> 19:04:26,820:24890(0x41057940):ZOO_WARN@zookeeper_interest
> > >> > @1461:
> > >> > Exceeded deadline by 77ms
> > >> > 2013-07-02
> 19:04:26,919:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> > >> > @1983:
> > >> > Got ping response in 98 ms
> > >> > 2013-07-02
> 19:04:30,163:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> > >> > @1983:
> > >> > Got ping response in 0 ms
> > >> > I0702 19:04:32.685693 24892 slave.cpp:2510] Current usage 71.33%.
> Max
> > >> > allowed age: 1.306755345349155days
> > >> > 2013-07-02
> 19:04:33,514:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> > >> > @1983:
> > >> > Got ping response in 17 ms
> > >> > 2013-07-02
> 19:04:36,832:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> > >> > @1983:
> > >> > Got ping response in 1 ms
> > >> > 2013-07-02
> 19:04:40,164:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> > >> > @1983:
> > >> > Got ping response in 0 ms
> > >> > 2013-07-02
> 19:04:43,498:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> > >> > @1983:
> > >> > Got ping response in 0 ms
> > >> > 2013-07-02
> 19:04:46,878:24890(0x41057940):ZOO_WARN@zookeeper_interest
> > >> > @1461:
> > >> > Exceeded deadline by 46ms
> > >> > 2013-07-02
> 19:04:46,880:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> > >> > @1983:
> > >> > Got ping response in 1 ms
> > >> > 2013-07-02
> 19:04:50,282:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> > >> > @1983:
> > >> > Got ping response in 71 ms
> > >> > 2013-07-02
> 19:04:53,565:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> > >> > @1983:
> > >> > Got ping response in 19 ms
> > >> > Result::get() but state == NONE
> > >> > *** Aborted at 1372763096 (unix time) try "date -d @1372763096" if
> you
> > >> are
> > >> > using GNU date ***
> > >> > PC: @       0x3d87a30215 (unknown)
> > >> > *** SIGABRT (@0x613a) received by PID 24890 (TID 0x4878f940) from
> PID
> > >> > 24890; stack trace: ***
> > >> >     @       0x3d8860e4c0 (unknown)
> > >> >     @       0x3d87a30215 (unknown)
> > >> >     @       0x3d87a31cc0 (unknown)
> > >> >     @     0x2b02c1bf96e5
> > >> mesos::internal::slave::ProcessIsolator::usage()
> > >> >     @     0x2b02c1b59a30 std::tr1::_Function_handler<>::_M_invoke()
> > >> >     @     0x2b02c1b5a361 std::tr1::function<>::operator()()
> > >> >     @     0x2b02c1b63f2b process::internal::pdispatcher<>()
> > >> >     @     0x2b02c1b5c45e std::tr1::_Function_handler<>::_M_invoke()
> > >> >     @     0x2b02c1dbf205 process::ProcessManager::resume()
> > >> >     @     0x2b02c1dbfbbf process::schedule()
> > >> >     @       0x3d88606367 (unknown)
> > >> >     @       0x3d87ad30ad (unknown)
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > Guodong
> > >> >
> > >>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message