mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph Wu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MESOS-5748) Potential segfault in `link` and `send` when linking to a remote process
Date Wed, 29 Jun 2016 23:50:12 GMT
Joseph Wu created MESOS-5748:
--------------------------------

             Summary: Potential segfault in `link` and `send` when linking to a remote process
                 Key: MESOS-5748
                 URL: https://issues.apache.org/jira/browse/MESOS-5748
             Project: Mesos
          Issue Type: Bug
          Components: libprocess
    Affects Versions: 0.28.0, 0.27.0, 0.26.0, 0.25.0, 0.24.0, 0.23.0, 0.22.0
            Reporter: Joseph Wu
             Fix For: 1.0.0


There is a race the SocketManager, between a remote {{link}} and disconnection of the underlying
socket.

We potentially segfault here: https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1512

{{\*socket}} dereferences the shared pointer underpinning the {{Socket*}} object.  However,
the code above this line actually has ownership of the pointer:
https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1494-L1499

If the socket dies during the link, the {{ignore_recv_data}} may delete the Socket underneath
{{link}}:
https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1399-L1411

----
The same race exists for {{send}}.

This race was discovered while running a new test in repetition:
https://reviews.apache.org/r/49175/

On OSX, I hit the race consistently every 500-800 repetitions:
{code}
3rdparty/libprocess/libprocess-tests --gtest_filter="ProcessRemoteLinkTest.RemoteLink"  --gtest_break_on_failure
--gtest_repeat=1000
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message