hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ming Li <...@pivotal.io>
Subject Re: [R] Fwd: Help: malloc/free deadlock in unsafe signal handler 'Rf_onsigusr1'
Date Wed, 03 Aug 2016 08:02:36 GMT
Thanks luke,

>From your explanation, it seems that the signal SIGUSR1 was only triggered
when user want to break/cancel R execution, so we can't control the time
when SIGUSR1 sent. So for this defect, the best way is to make sure in the
signal handler function, we need to make sure there is no direct/cascade
call malloc/free, if any buffer needed, it is better to keep buffer static
and malloc before calling signal handler.


On Tue, Aug 2, 2016 at 11:39 PM, <luke-tierney@uiowa.edu> wrote:

> Redirecting to R-devel
>
> I don't recall how long the SUGUSR handlers have been in R -- you can
> check in svn if you like -- it's been a long time. The intention is
> for them to serve as an emergency break -- a chance of possibly saving
> the workspace when you get stuck in an infinite loop in C/Fortran code
> that cant be interrupted by a SIGINT. This can't be accomplished
> without doing things that really shouldn't be done in a signal handler.
> That is all these handlers are intended for. If you are using them
> programatically you should rethink what you are doing. If you explain
> what you are trying to do you might get some help with that.
>
> Best,
>
> luke
>
>
> On Tue, 2 Aug 2016, Ming Li wrote:
>
> Thanks luke. cc hawq dev team.
>> I sent this email to R-devel 2 days before forwarding it to R-help, but no
>> one reply.
>>
>> Is there any workaround? When were SIGUSR1 and SIGUSR2 sent in R? Or maybe
>> we should move all operations not too emergency out of signal handler?
>> Thanks.
>>
>> On Tue, Aug 2, 2016 at 4:02 AM, <luke-tierney@uiowa.edu> wrote:
>>       The handlers for SIGUSR1 and SIGUSR2 are really intended as an
>>       emergency break, not for ordinary programming. These could be
>>       rewritten to be safer but that would make them less immediate.
>>
>>       Followups would be more appropriate on R-devel.
>>
>>       Best,
>>
>>       luke
>>
>>       On Mon, 1 Aug 2016, Ming Li wrote:
>>
>>       Hi all,
>>
>>       I am working on a bug,  which running PLR on HAWQ. The
>>       process hung and
>>       can't be terminated.
>>
>>             From my investigation, it seems signal handler
>>             'Rf_onsigusr1' trigger a
>>
>>       malloc/free deadlock.
>>
>>       The calling stack is below.
>>
>>       Thread 1 (Thread 0x7f4c93af48e0 (LWP 431263)):
>>       #0  0x00007f4c9015805e in __lll_lock_wait_private () from
>>       /lib64/libc.so.6
>>       #1  0x00007f4c900dd16b in _L_lock_9503 () from
>>       /lib64/libc.so.6
>>       #2  0x00007f4c900da6a6 in malloc () from /lib64/libc.so.6
>>       #3  0x00007f4c9008fb39 in _nl_make_l10nflist () from
>>       /lib64/libc.so.6
>>       #4  0x00007f4c9008ddf5 in _nl_find_domain () from
>>       /lib64/libc.so.6
>>       #5  0x00007f4c9008d6e0 in __dcigettext () from
>>       /lib64/libc.so.6
>>       #6  0x00007f4c6fabcfe3 in Rf_onsigusr1 () from
>>       /usr/local/lib64/R/lib/libR.so
>>       #7  <signal handler called>
>>       #8  0x00007f4c9014079a in brk () from /lib64/libc.so.6
>>       #9  0x00007f4c90140845 in sbrk () from /lib64/libc.so.6
>>       #10 0x00007f4c900dd769 in __default_morecore () from
>>       /lib64/libc.so.6
>>       #11 0x00007f4c900d87a2 in _int_free () from
>>       /lib64/libc.so.6
>>       #12 0x0000000000b3ff24 in gp_free2 ()
>>       #13 0x0000000000b356fc in AllocSetDelete ()
>>       #14 0x0000000000b38391 in MemoryContextDeleteImpl ()
>>       #15 0x000000000077c851 in ExecEndAgg ()
>>       #16 0x00000000007592ad in ExecEndNode ()
>>       #17 0x000000000075186c in ExecEndPlan ()
>>       #18 0x000000000079dffa in ExecEndSubqueryScan ()
>>       #19 0x000000000075921d in ExecEndNode ()
>>       #20 0x000000000075186c in ExecEndPlan ()
>>       #21 0x0000000000752565 in ExecutorEnd ()
>>       #22 0x00000000006dd9bd in PortalCleanup ()
>>       #23 0x0000000000b3f077 in AtCommit_Portals ()
>>       #24 0x000000000051abe5 in CommitTransaction ()
>>       #25 0x000000000051f1d5 in CommitTransactionCommand ()
>>       #26 0x000000000099809e in PostgresMain ()
>>       #27 0x00000000008f1031 in BackendStartup ()
>>       #28 0x00000000008f70e0 in PostmasterMain ()
>>       #29 0x00000000007f63da in main ()
>>
>>
>>       I googled and found below info maybe useful to fix it: The
>>       best way to
>>       avoid this kind of deadlock is to Call only
>>       asynchronous-safe functions
>>       within signal handlers.
>>
>>
>> https://www.securecoding.cert.org/confluence/display/c/SIG30-C.+Call+only+a
>>       synchronous-safe+functions+within+signal+handlers
>>
>>       Thanks a lot.
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more,
>> see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible
>> code.
>>
>>
>> --
>> Luke Tierney
>> Ralph E. Wareham Professor of Mathematical Sciences
>> University of Iowa                  Phone:             319-335-3386
>> Department of Statistics and        Fax:               319-335-3017
>>    Actuarial Science
>> 241 Schaeffer Hall                  email:   luke-tierney@uiowa.edu
>> Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
>>
>>
>>
>>
>>
> --
> Luke Tierney
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa                  Phone:             319-335-3386
> Department of Statistics and        Fax:               319-335-3017
>    Actuarial Science
> 241 Schaeffer Hall                  email:   luke-tierney@uiowa.edu
> Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message