incubator-mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Hindman (JIRA)" <>
Subject [jira] [Created] (MESOS-190) Slave seg fault when executor exited
Date Mon, 07 May 2012 17:00:51 GMT
Benjamin Hindman created MESOS-190:

             Summary: Slave seg fault when executor exited
                 Key: MESOS-190
             Project: Mesos
          Issue Type: Bug
            Reporter: Benjamin Hindman
            Assignee: Vinod Kone
            Priority: Blocker

When I restart/kill early or otherwise interrupt my framework from the
client, I often segfault the slave.  I'm not sure if there is a bug in
my executor, but it seems Mesos should be more resilient than this.

Mesos subversion -r 1331158

I know optimized builds can be tricky to debug, but in this case it
does look like it was trying to dereference the invalid Task* address
(note that task matches %rdx, and the crashed assembly code is trying
to dereference %rdx).

Any suggestions?

(gdb) bt
#0  mesos::internal::slave::Slave::executorExited (this=0x1305820,
   frameworkId=..., executorId=..., status=0) at slave/slave.cpp:1400
#1  0x00007f0cf310526d in __call<process::ProcessBase*&, 0, 1> (__args=...,
   this=<optimized out>) at /usr/include/c++/4.6/tr1/functional:1153
#2  operator()<process::ProcessBase*> (this=<optimized out>)
   at /usr/include/c++/4.6/tr1/functional:1207
#3  std::tr1::_Function_handler<void (process::ProcessBase*),
std::tr1::_Bind<void (*(std::tr1::_Placeholder<1>,
(mesos::internal::slave::Slave*)> >))(process::ProcessBase*,
(mesos::internal::slave::Slave*)> >)> >::_M_invoke(std::tr1::_Any_data
const&, process::ProcessBase*) (__functor=...,
   __args#0=<optimized out>) at /usr/include/c++/4.6/tr1/functional:1684
#4  0x00007f0cf32014a3 in std::tr1::function<void
(process::ProcessBase*)>::operator()(process::ProcessBase*) const ()
  from /home/ubuntu/cr/lib/
#5  0x00007f0cf31f617f in
process::ProcessBase::visit(process::DispatchEvent const&) () from
#6  0x00007f0cf31f885c in
process::DispatchEvent::visit(process::EventVisitor*) const () from
#7  0x00007f0cf31f38cf in
process::ProcessManager::resume(process::ProcessBase*) () from
#8  0x00007f0cf31ec783 in process::schedule(void*) ()
  from /home/ubuntu/cr/lib/
#9  0x00007f0cf26e5e9a in start_thread ()
  from /lib/x86_64-linux-gnu/
#10 0x00007f0cf24134bd in clone () from /lib/x86_64-linux-gnu/
#11 0x0000000000000000 in ?? ()
(gdb) print task
$1 = (mesos::internal::Task *) 0x3031406576616c73
(gdb) info register
rax            0x7f0cf3647cf0   139693599784176
rbx            0x0      0
rcx            0x7f0ce8000038   139693408649272
rdx            0x3031406576616c73       3472627592201333875
rsi            0x2      2
rdi            0x7f0cf0613ac0   139693549238976
rbp            0x7f0ce80034c8   0x7f0ce80034c8
rsp            0x7f0cf0613c00   0x7f0cf0613c00
r8             0x7f0ce80009b0   139693408651696
r9             0x1      1
r10            0x6      6
r11            0x1      1
r12            0x7f0ce8001ca0   139693408656544
r13            0x7f0ce80056c0   139693408671424
r14            0x7f0ce8006cc0   139693408677056
r15            0x1305820        19945504
rip            0x7f0cf30fecd5   0x7f0cf30fecd5
const&, mesos::ExecutorID const&, int)+533>
eflags         0x10206  [ PF IF RF ]
cs             0xe033   57395
ss             0xe02b   57387
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x0      0


 0x00007f0cf30fecb9 <+505>:    mov    %rax,0x20(%rsp)
  0x00007f0cf30fecbe <+510>:   xor    %ebx,%ebx
  0x00007f0cf30fecc0 <+512>:   cmp    0x20(%rsp),%r12
  0x00007f0cf30fecc5 <+517>:   je     0x7f0cf30fed2e
const&, mesos::ExecutorID const&, int)+622>
  0x00007f0cf30fecc7 <+519>:   test   %r12,%r12
  0x00007f0cf30fecca <+522>:   je     0x7f0cf30ff27d
const&, mesos::ExecutorID const&, int)+1981>
  0x00007f0cf30fecd0 <+528>:   mov    0x28(%r12),%rdx
=> 0x00007f0cf30fecd5 <+533>:   mov    0x70(%rdx),%edi
  0x00007f0cf30fecd8 <+536>:   mov    %rdx,0x8(%rsp)
  0x00007f0cf30fecdd <+541>:   callq  0x7f0cf3062220
  0x00007f0cf30fece2 <+546>:   test   %al,%al
  0x00007f0cf30fece4 <+548>:   mov    0x8(%rsp),%rdx
  0x00007f0cf30fece9 <+553>:   je     0x7f0cf30ff020
const&, mesos::ExecutorID const&, int)+1376>
  0x00007f0cf30fecef <+559>:   test   %rbp,%rbp
  0x00007f0cf30fecf2 <+562>:   je     0x7f0cf30ff244
const&, mesos::ExecutorID const&, int)+1---Type <return> to continue,
or q <re

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message