mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Rojas <alexan...@mesosphere.io>
Subject Re: [jira] [Comment Edited] (MESOS-2079) IO.Write test is flaky on OS X 10.10.
Date Wed, 11 Nov 2015 08:44:57 GMT
What I meant is that we may not care about SIGPIPE (which tell us a pipe was broken) because
we will be notified when we try to write into it anyway (on the writing side) and we will
get an EOF on the reading side.

The only thing I could see us caring about SIGPIPE is if we want to know as soon as the pipe
breaks that the event happened.
> On 06 Nov 2015, at 19:10, Benjamin Mahler <benjamin.mahler@gmail.com> wrote:
> 
> To answer your questions:
> 
> We use pipes when we need to communicate across the process boundary after
> a fork. Look for Subprocess::IO::Pipe for examples. There is plenty of code
> using pipes.
> 
> Sockets aren't an issue as one can avoid SIGPIPE across OS X (SO_NOSIGPIPE)
> and Linux (MSG_NOSIGNAL).
> 
> I'm a bit confused by your comment about the timing of SIGPIPE, which seems
> to suggest that the raising of SIGPIPE is not tied to the bad write call.
> Why do you think this?
> 
> On Fri, Nov 6, 2015 at 4:37 AM, Alexander Rojas <alexander@mesosphere.io>
> wrote:
> 
>> I have multiple questions here
>> 
>> 1. Why do we use pipes at all? or is SIGPIPE raised also when writing into
>> sockets? which leads me to:
>> 2. Do we use it only in test cases or is there something actively using
>> pipes?
>> 
>> SIGPIPE itself is a weird signal, since a failed call to `write` returns
>> -1 and sets `errno` to `EPIPE` so there are two ways to deal with errors
>> when the reading process is not longer reading, one is handling the return
>> value+errno (which usually means ignoring the SIGPIPE) and the second is
>> ignoring the return value and handling SIGPIPE. The difference is that
>> SIGPIPE is raised as soon as the OS realizes the pipe is broken while the
>> error on the write happens when you actually try to write on the pipe.
>> 
>> All in all, I prefer to ignore the signal and deal with the return value
>> of `write`.
>> 
>>> On 06 Nov 2015, at 03:27, Benjamin Mahler <benjamin.mahler@gmail.com>
>> wrote:
>>> 
>>> Just want to surface this up to the dev@ thread to raise some awareness.
>>> Recently with the SIGPIPE bug from libev [1], we've revisited whether it
>>> makes sense to continue down the path of leaving SIGPIPE unblocked and
>>> trying to handle it case by case.
>>> 
>>> We originally wanted users of libprocess to decide on their own whether
>>> they want to ignore SIGPIPE. However, we'd like to reconsider:
>>> 
>>> (a) The amount of code that is needed to work around SIGPIPE is
>>> substantial, especially because on OS X SIGPIPE appears to not be
>> delivered
>>> synchronously [2]. Also, it is not possible to create pipes that don't
>>> surface SIGPIPE (unlike sockets), so in order to safely write to a pipe
>> we
>>> need to wrap write() calls with signal suppression blocks (which we don't
>>> do in general!). You can get a sense of the code from [3] and [4].
>>> 
>>> (b) SIGPIPE seems to be more of a legacy mechanism to shut down a set of
>>> piped programs and the general recommendation seems to be to not bother
>>> with it and ignore it. Programs can handle EPIPE as they would with other
>>> signals.
>>> 
>>> Would love to hear if there are any concerns. I will be glad to shepherd
>>> James' changes here.
>>> 
>>> [1] https://issues.apache.org/jira/browse/MESOS-2768
>>> [2] https://issues.apache.org/jira/browse/MESOS-2079
>>> [3] https://reviews.apache.org/r/39940/diff/1#index_header
>>> [4]
>>> 
>> https://github.com/apache/mesos/blob/0.25.0/3rdparty/libprocess/3rdparty/stout/include/stout/os/posix/signals.hpp#L101
>>> 
>>> On Wed, Nov 4, 2015 at 9:20 AM, James Peach (JIRA) <jira@apache.org>
>> wrote:
>>> 
>>>> 
>>>>   [
>>>> 
>> https://issues.apache.org/jira/browse/MESOS-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14989947#comment-14989947
>>>> ]
>>>> 
>>>> James Peach edited comment on MESOS-2079 at 11/4/15 5:19 PM:
>>>> -------------------------------------------------------------
>>>> 
>>>> These patches global ignore {{SIGPIPE}} during libprocess
>> initialization,
>>>> document {{SIGPIPE}} behavior a bit more, and remove various signal
>>>> manipulations that were formerly necessary for disabling {{SIGPIPE}}
>>>> delivery.
>>>> 
>>>> https://reviews.apache.org/r/39938/
>>>> https://reviews.apache.org/r/39940/
>>>> https://reviews.apache.org/r/39941/
>>>> 
>>>> 
>>>> 
>>>> was (Author: jamespeach):
>>>> https://reviews.apache.org/r/39938/
>>>> https://reviews.apache.org/r/39940/
>>>> https://reviews.apache.org/r/39941/
>>>> 
>>>> 
>>>>> IO.Write test is flaky on OS X 10.10.
>>>>> -------------------------------------
>>>>> 
>>>>>               Key: MESOS-2079
>>>>>               URL: https://issues.apache.org/jira/browse/MESOS-2079
>>>>>           Project: Mesos
>>>>>        Issue Type: Task
>>>>>        Components: libprocess, technical debt, test
>>>>>       Environment: OS X 10.10
>>>>> {noformat}
>>>>> $ clang++ --version
>>>>> Apple LLVM version 6.0 (clang-600.0.54) (based on LLVM 3.5svn)
>>>>> Target: x86_64-apple-darwin14.0.0
>>>>> Thread model: posix
>>>>> {noformat}
>>>>>          Reporter: Benjamin Mahler
>>>>>          Assignee: James Peach
>>>>>            Labels: flaky
>>>>> 
>>>>> [~benjaminhindman]: If I recall correctly, this is related to
>>>> MESOS-1658. Unfortunately, we don't have a stacktrace for SIGPIPE
>> currently:
>>>>> {noformat}
>>>>> [ RUN      ] IO.Write
>>>>> make[5]: *** [check-local] Broken pipe: 13
>>>>> {noformat}
>>>>> Running in gdb, seems to always occur here:
>>>>> {code}
>>>>> Program received signal SIGPIPE, Broken pipe.
>>>>> [Switching to process 56827 thread 0x60b]
>>>>> 0x00007fff9a011132 in __psynch_cvwait ()
>>>>> (gdb) where
>>>>> #0  0x00007fff9a011132 in __psynch_cvwait ()
>>>>> #1  0x00007fff903e7ea0 in _pthread_cond_wait ()
>>>>> #2  0x000000010062f27c in Gate::arrive (this=0x101908a10, old=14780)
at
>>>> gate.hpp:82
>>>>> #3  0x0000000100600888 in process::schedule (arg=0x0) at
>>>> src/process.cpp:1373
>>>>> #4  0x00007fff903e72fc in _pthread_body ()
>>>>> #5  0x00007fff903e7279 in _pthread_start ()
>>>>> #6  0x00007fff903e54b1 in thread_start ()
>>>>> {code}
>>>> 
>>>> 
>>>> 
>>>> --
>>>> This message was sent by Atlassian JIRA
>>>> (v6.3.4#6332)
>>>> 
>> 
>> 


Mime
View raw message