hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kuien Liu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HAWQ-1573) crash during proc_exit when write message to server log
Date Fri, 15 Dec 2017 04:47:02 GMT
Kuien Liu created HAWQ-1573:
-------------------------------

             Summary: crash during proc_exit when write message to server log
                 Key: HAWQ-1573
                 URL: https://issues.apache.org/jira/browse/HAWQ-1573
             Project: Apache HAWQ
          Issue Type: Bug
          Components: Core
            Reporter: Kuien Liu
            Assignee: Radar Lei


Core stack:


{code:java}
#0 0x00007fba3973bfcb in raise () from /lib64/libpthread.so.0
#1 0x0000000000975e3f in SafeHandlerForSegvBusIll (processName=0xc6bbbb "Master process",
postgres_signal_arg=11)
at elog.c:4537
#2 0x0000000000975ffe in StandardHandlerForSigillSigsegvSigbus_OnMainThread (
processName=0xc6bbbb "Master process", postgres_signal_arg=11) at elog.c:4615
#3 0x0000000000891804 in CdbProgramErrorHandler (postgres_signal_arg=11) at postgres.c:3609
#4 <signal handler called>
#5 0x00007fba3867d431 in __strlen_sse2_pminub () from /lib64/libc.so.6
#6 0x00000000009729cc in append_string_to_pipe_chunk (buffer=0x7ffe98a042c0,
input=0x4257c50 <Address 0x4257c50 out of bounds>) at elog.c:2660
#7 0x0000000000973d12 in write_message_to_server_log (elevel=15, sqlerrcode=0,
message=0x21b0260 "clean up communication to resource manager now.", detail=0x0, hint=0x0,
query_text=0x4257c50 <Address 0x4257c50 out of bounds>, cursorpos=0, internalpos=0,
internalquery=0x0,
context=0x0, funcname=0xcc9450 <__func__.29200> "cleanupQD2RMComm", show_funcname=0
'\000',
filename=0xcc83e0 "rmcomm_QD2RM.c", lineno=460, stacktracesize=21, omit_location=1 '\001',
send_alert=0 '\000',
stacktracearray=0x10c8f38 <errordata+120>, printstack=0 '\000') at elog.c:3246
#8 0x0000000000973fc4 in send_message_to_server_log (edata=0x10c8ec0 <errordata>) at
elog.c:3296
#9 0x0000000000970b50 in EmitErrorReport () at elog.c:1495
#10 0x000000000096e589 in errfinish (dummy=0) at elog.c:602
#11 0x0000000000970a7d in elog_finish (elevel=15, fmt=0xcc85c8 "clean up communication to
resource manager now.")
at elog.c:1464
#12 0x00000000009d29da in cleanupQD2RMComm () at rmcomm_QD2RM.c:460
#13 0x0000000000875d73 in proc_exit_prepare (code=1) at ipc.c:240
#14 0x0000000000875c24 in proc_exit (code=1) at ipc.c:101
#15 0x000000000096e78d in errfinish (dummy=0) at elog.c:671
#16 0x00000000008919d2 in ProcessInterrupts () at postgres.c:3693
#17 0x00000000006eedf6 in ExecProcNode (node=0x425cfd0) at execProcnode.c:862
#18 0x00000000006e9132 in ExecutePlan (estate=0x425c400, planstate=0x425cfd0, operation=CMD_SELECT,
numberTuples=0,
direction=ForwardScanDirection, dest=0x37d2c00) at execMain.c:3211
#19 0x00000000006e5cdc in ExecutorRun (queryDesc=0x4258270, direction=ForwardScanDirection,
count=0)
at execMain.c:1214
#20 0x00000000008991dd in PortalRunSelect (portal=0x3850e50, forward=1 '\001', count=0, dest=0x37d2c00)
at pquery.c:1737
{code}


I find some tips from PostgreSQL:


{noformat}
commit e1eb7c81192bec3735eed3228202b400f31c8010
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Sat Mar 20 00:58:21 2010 +0000

    Clear error_context_stack and debug_query_string at the beginning of proc_exit,
    so that we won't try to attach any context printouts to messages that get
    emitted while exiting.  Per report from Dennis Koegel, the context functions
    won't necessarily work after we've started shutting down the backend, and it
    seems possible that debug_query_string could be pointing at freed storage
    as well.  The context information doesn't seem particularly relevant to
    such messages anyway, so there's little lost by suppressing it.

    Back-patch to all supported branches.  I can only demonstrate a crash with
    log_disconnections messages back to 8.1, but the risk seems real in 8.0 and
    before anyway.
{noformat}

I saw [~mli] has backported something by HAWQ-1208, but I don't know why it is not fixed in
Greenplum at that time (now it is fixed).  We need to patch it to Hawq as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message