mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Smith <scott.sm...@gmail.com>
Subject Re: segfault in libprocess (slave)
Date Wed, 09 May 2012 22:01:04 GMT
I've run today with a similar patch and it (along with the MESOS-190
fix) addresses my segfault issues.  Before I would get 5+ per day;
today has been core file free!

On Wed, May 9, 2012 at 2:47 PM, Benjamin Hindman <benh@eecs.berkeley.edu> wrote:
> I've committed a fix in r1336417. Please let me know if this fixes the
> problem or if more needs to be done. Thank you!
>
>
> On Wed, May 9, 2012 at 1:46 PM, Benjamin Hindman <benh@eecs.berkeley.edu>wrote:
>
>> Yes, this looks like it should be the case. :(
>>
>> I'll fix this bug ASAP. Thanks for reporting!
>>
>>
>>
>> On Wed, May 9, 2012 at 8:56 AM, Scott Smith <scott.smith@gmail.com> wrote:
>>
>>> I've had numerous other segfaults in libprocess, mostly in
>>> std::map/rbtree code.  Is it possible that SocketManager::accepted is
>>> missing a synchronized(this) {} block?
>>>
>>> from process.cpp:
>>>
>>> Socket SocketManager::accepted(int s)
>>> {
>>>  return sockets[s] = Socket(s);
>>> }
>>>
>>> On Mon, May 7, 2012 at 11:40 PM, Scott Smith <scott.smith@gmail.com>
>>> wrote:
>>> > I've encountered another segfault in the slave.  This time, nothing
>>> > unusual was happening.  Single framework / single user.  Four slaves,
>>> > one master, framework run from master.
>>> >
>>> > version:
>>> > svn Revision: 1334534 + proposed fix for MESOS-190:
>>> > https://reviews.apache.org/r/5057/diff/2/#index_header
>>> >
>>> > log messages:
>>> > I0508 06:35:21.458798   828 slave.cpp:447] Got assigned task 8:864:0
>>> > for framework 201205080535222558218-5050-29475-0004
>>> > I0508 06:35:21.459225   829 slave.cpp:689] Got acknowledgement of
>>> > status update for task 8:863:0 of framework
>>> > 201205080535222558218-5050-29475-0004
>>> > F0508 06:35:21.459432   832 process.cpp:1772] Check failed:
>>> > sockets.count(s) > 0
>>> >
>>> > stack trace:
>>> > #0  0x00007f0aecdf0445 in raise () from /lib/x86_64-linux-gnu/libc.so.6
>>> > #1  0x00007f0aecdf3bab in abort () from /lib/x86_64-linux-gnu/libc.so.6
>>> > #2  0x00007f0aedd65dd9 in google::DumpStackTraceAndExit () at
>>> > src/utilities.cc:145
>>> > #3  0x00007f0aedd5ed9d in google::LogMessage::Fail () at
>>> src/logging.cc:1256
>>> > #4  0x00007f0aedd6152f in google::LogMessage::SendToLog
>>> (this=0x7f0ae8a71c60)
>>> >    at src/logging.cc:1216
>>> > #5  0x00007f0aedd5e99b in google::LogMessage::Flush
>>> (this=0x7f0ae8a71c60)
>>> >    at src/logging.cc:1088
>>> > #6  0x00007f0aedd61dbd in google::LogMessageFatal::~LogMessageFatal (
>>> >    this=0x7f0ae8a71c60, __in_chrg=<optimized out>) at
>>> src/logging.cc:1777
>>> > #7  0x00007f0aedc93a55 in process::SocketManager::next(int) ()
>>> >   from /home/ubuntu/cr/lib/libmesos-0.9.0.so
>>> > #8  0x00007f0aedc8e119 in process::send_data(ev_loop*, ev_io*, int) ()
>>> >   from /home/ubuntu/cr/lib/libmesos-0.9.0.so
>>> > #9  0x00007f0aedd9e6ef in ev_invoke_pending (loop=0x7f0aee119240) at
>>> ev.c:1971
>>> > #10 0x00007f0aedda2a24 in ev_loop (loop=0x7f0aee119240,
>>> flags=<optimized out>)
>>> >    at ev.c:2333
>>> > #11 0x00007f0aedc8f30d in process::serve(void*) ()
>>> >   from /home/ubuntu/cr/lib/libmesos-0.9.0.so
>>> > #12 0x00007f0aed17ee9a in start_thread () from
>>> > /lib/x86_64-linux-gnu/libpthread.so.0
>>> > #13 0x00007f0aeceac4bd in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> > #14 0x0000000000000000 in ?? ()
>>> >
>>> > --
>>> >         Scott
>>>
>>>
>>>
>>> --
>>>         Scott
>>>
>>
>>



-- 
        Scott

Mime
View raw message