Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@httpd.apache.org
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 10.2 \(3259\))
Subject: Re: mod_http2 and Frequent wake-ups for mpm_event
From: Stefan Eissing <stefan.eissing@greenbytes.de>
In-Reply-To: <94698d9e-bad4-bd20-cfac-bf4b945203b7@profihost.ag>
Date: Sun, 22 Jan 2017 17:17:02 +0100
Cc: Yann Ylavic <ylavic.dev@gmail.com>
Content-Transfer-Encoding: quoted-printable
Message-Id: <8823CE4D-1357-4939-8FD7-108167BA75E9@greenbytes.de>
References: <CAFedD40FRropb_92ZNd7b90TPbO10aS5oc78zt=tBZ135G3duA@mail.gmail.com>
 <3214CF17-8921-4347-B801-0093BA29C280@profihost.ag>
 <58893576-8D52-43AD-8AB1-31E5D70CDA9B@greenbytes.de>
 <8eca89a8-03ae-108f-d3df-5cc85911811e@profihost.ag>
 <CAKQ1sVO3NeGdQOvDq41=MYhDSjCTo2pk_SMhv6SW2ZLcZNQECA@mail.gmail.com>
 <DCA0B7EA-1CFD-453B-B414-A523FFAAF377@profihost.ag>
 <CAKQ1sVOJEtw7LHEj2oYoKMf55BbkyT6ipNDrsnUZFd8JmN=bGQ@mail.gmail.com>
 <73c1b14f-18f6-af45-c8ea-3584d6652d7a@profihost.ag>
 <21cd5f8f-a2c7-8aba-ccfb-5f3c399edebc@profihost.ag>
 <CAKQ1sVPTpgJUd+bv7O2z-djLbQ_qaVZosHA12xdAUCsKhGbwDg@mail.gmail.com>
 <289b946c-9114-0eac-e470-67c0de909fd5@profihost.ag>
 <86e616ec-271c-3576-d4a1-08143e04976e@profihost.ag>
 <02B166CD-4D82-4F87-94AB-F4603723BDBA@greenbytes.de>
 <50cfac51-1c47-b2c1-4348-bc0b21dfcbac@profihost.ag>
 <e8f83399-8ef9-00d6-d014-1728b8ff99bf@profihost.ag>
 <ab969407-5a09-ea76-dc18-d51bc98e2b00@profihost.ag>
 <1d8d1ff9-a3ca-4488-4cde-e093b3942db7@profihost.ag>
 <94698d9e-bad4-bd20-cfac-bf4b945203b7@profihost.ag>
To: dev@httpd.apache.org
archived-at: Sun, 22 Jan 2017 16:17:26 -0000


> Am 22.01.2017 um 17:14 schrieb Stefan Priebe - Profihost AG =
<s.priebe@profihost.ag>:
>=20
> *arg* it's just mod_proxy - just saw thread safety and apr bucket =
aloc.

??? Can you elaborate? Is your finding the known hcheck bug or something =
else?

> Stefan
>=20
> Am 22.01.2017 um 17:06 schrieb Stefan Priebe - Profihost AG:
>> Looks like others have the same crashes too:
>> https://bz.apache.org/bugzilla/show_bug.cgi?id=3D60071
>> and
>> =
https://github.com/apache/httpd/commit/8e63c3c9372cd398f57357099aa941cbba6=
95758
>>=20
>> So it looks like mod_http2 is running fine now. Thanks a lot Stefan.
>>=20
>> Yann i think i can start testing your mpm patch again after the
>> segfaults in 2.4 branch are fixed.
>>=20
>> Greets,
>> Stefan
>>=20
>> Am 22.01.2017 um 13:16 schrieb Stefan Priebe:
>>> Hi,
>>>=20
>>> and a new one but also in ap_start_lingering_close:
>>>=20
>>> Program terminated with signal SIGSEGV, Segmentation fault.
>>> #0  apr_palloc (pool=3Dpool@entry=3D0x7f455805e138, =
in_size=3Din_size@entry=3D32)
>>>    at memory/unix/apr_pools.c:684
>>> #0  apr_palloc (pool=3Dpool@entry=3D0x7f455805e138, =
in_size=3Din_size@entry=3D32)
>>>    at memory/unix/apr_pools.c:684
>>> #1  0x00007f456bc5d8b4 in apr_brigade_create (p=3D0x7f455805e138,
>>>    list=3D0x7f45040034e8) at buckets/apr_brigade.c:61
>>> #2  0x000055e165efa319 in ap_shutdown_conn =
(c=3Dc@entry=3D0x7f455805e458,
>>>    flush=3Dflush@entry=3D1) at connection.c:76
>>> #3  0x000055e165efa40d in ap_flush_conn (c=3D0x7f455805e458) at
>>> connection.c:95
>>> #4  ap_start_lingering_close (c=3D0x7f455805e458) at =
connection.c:145
>>> #5  0x000055e165f942dd in start_lingering_close_blocking =
(cs=3D<optimized
>>> out>)
>>>    at event.c:876
>>> #6  process_socket (my_thread_num=3D<optimized out>,
>>>    my_child_num=3D<optimized out>, cs=3D0x7f455805e3c8, =
sock=3D<optimized out>,
>>>    p=3D<optimized out>, thd=3D<optimized out>) at event.c:1153
>>> #7  worker_thread (thd=3D0x7f455805e138, dummy=3D0x20) at =
event.c:2001
>>> #8  0x00007f456b80a0a4 in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #9  0x00007f456b53f62d in clone () from =
/lib/x86_64-linux-gnu/libc.so.6
>>>=20
>>> Stefan
>>>=20
>>> Am 21.01.2017 um 19:31 schrieb Stefan Priebe:
>>>> All last traces come from event, proces_longering_close =
ap_push_pool but
>>>> end in different functions. It looks like a race somewhere and it =
just
>>>> races at different function in the event of close and pool clear.
>>>>=20
>>>> Might there be two places where the same pool gets cleared?
>>>>=20
>>>> Stefan
>>>>=20
>>>> Am 21.01.2017 um 19:07 schrieb Stefan Priebe:
>>>>> Hi Stefan,
>>>>>=20
>>>>> thanks. No crashes where h2 comes up. But i still have these and =
no idea
>>>>> how to find out who and why they're crashing.
>>>>>=20
>>>>> Using host libthread_db library
>>>>> "/lib/x86_64-linux-gnu/libthread_db.so.1".
>>>>> Core was generated by `/usr/local/apache2/bin/httpd -k start'.
>>>>> Program terminated with signal SIGSEGV, Segmentation fault.
>>>>> #0  allocator_free (node=3D0x0, allocator=3D0x7f6e08066540)
>>>>>    at memory/unix/apr_pools.c:381
>>>>> #0  allocator_free (node=3D0x0, allocator=3D0x7f6e08066540)
>>>>>    at memory/unix/apr_pools.c:381
>>>>> #1  apr_pool_clear (pool=3D0x7f6e0808d238) at =
memory/unix/apr_pools.c:793
>>>>> #2  0x00000000004fe528 in ap_push_pool (queue_info=3D0x0,
>>>>>    pool_to_recycle=3D0x7f6e08066548) at fdqueue.c:234
>>>>> #3  0x00000000004fa2c8 in process_lingering_close =
(cs=3D0x7f6e0808d4c8,
>>>>>    pfd=3D0x1d3bf98) at event.c:1439
>>>>> #4  0x00000000004fd410 in listener_thread (thd=3D0x1d3cb70,
>>>>> dummy=3D0x7f6e0808d4c8)
>>>>>    at event.c:1704
>>>>> #5  0x00007f6e1aed20a4 in start_thread ()
>>>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>>>> #6  0x00007f6e1aa0362d in clone () from =
/lib/x86_64-linux-gnu/libc.so.6
>>>>> (gdb) (gdb) quit
>>>>>=20
>>>>> Reading symbols from /usr/local/apache/bin/httpd...Reading symbols =
from
>>>>> /usr/lib/debug//usr/local/apache2/bin/httpd...done.
>>>>> done.
>>>>> [Thread debugging using libthread_db enabled]
>>>>> Using host libthread_db library
>>>>> "/lib/x86_64-linux-gnu/libthread_db.so.1".
>>>>> Core was generated by `/usr/local/apache2/bin/httpd -k start'.
>>>>> Program terminated with signal SIGSEGV, Segmentation fault.
>>>>> #0  allocator_free (node=3D0x0, allocator=3D0x7f6e08053ae0)
>>>>>    at memory/unix/apr_pools.c:381
>>>>> #0  allocator_free (node=3D0x0, allocator=3D0x7f6e08053ae0)
>>>>>    at memory/unix/apr_pools.c:381
>>>>> #1  apr_pool_clear (pool=3D0x7f6e08076bb8) at =
memory/unix/apr_pools.c:793
>>>>> #2  0x00000000004fe528 in ap_push_pool (queue_info=3D0x0,
>>>>>    pool_to_recycle=3D0x7f6e08053ae8) at fdqueue.c:234
>>>>> #3  0x00000000004fa2c8 in process_lingering_close =
(cs=3D0x7f6e08076e48,
>>>>>    pfd=3D0x1d3bf98) at event.c:1439
>>>>> #4  0x00000000004fd410 in listener_thread (thd=3D0x1d3cb70,
>>>>> dummy=3D0x7f6e08076e48)
>>>>>    at event.c:1704
>>>>> #5  0x00007f6e1aed20a4 in start_thread ()
>>>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>>>> #6  0x00007f6e1aa0362d in clone () from =
/lib/x86_64-linux-gnu/libc.so.6
>>>>> (gdb) (gdb) quit
>>>>>=20
>>>>> Stefan
>>>>>=20
>>>>> Am 21.01.2017 um 17:03 schrieb Stefan Eissing:
>>>>>> Stefan,
>>>>>>=20
>>>>>> made a release at =
https://github.com/icing/mod_h2/releases/tag/v1.8.9
>>>>>> with all patches and improved (hopefully) on them a bit. If you =
dare
>>>>>> to drop that into your installation, that'd be great.
>>>>>>=20
>>>>>> Cheers,
>>>>>>=20
>>>>>> Stefan
>>>>>>=20
>>>>>>> Am 21.01.2017 um 15:25 schrieb Stefan Priebe =
<s.priebe@profihost.ag>:
>>>>>>>=20
>>>>>>> and i got another crash here:
>>>>>>>=20
>>>>>>> 2346 static void run_cleanups(cleanup_t **cref)
>>>>>>> 2347 {
>>>>>>> 2348     cleanup_t *c =3D *cref;
>>>>>>> 2349
>>>>>>> 2350     while (c) {
>>>>>>> 2351         *cref =3D c->next;
>>>>>>> 2352         (*c->plain_cleanup_fn)((void *)c->data);   <=3D=3D =
here
>>>>>>> 2353         c =3D *cref;
>>>>>>> 2354
>>>>>>>=20
>>>>>>> which looks similar to the other crash.
>>>>>>>=20
>>>>>>> #0  0x00007fe4bbd33e1b in run_cleanups (cref=3D<optimized out>) =
at
>>>>>>> memory/unix/apr_pools.c:2352
>>>>>>> #1  apr_pool_clear (pool=3D0x7fe4a804dac8) at
>>>>>>> memory/unix/apr_pools.c:772
>>>>>>> #2  0x00000000004feb38 in ap_push_pool
>>>>>>> (queue_info=3D0x6d616e79642d3733, pool_to_recycle=3D0x2) at =
fdqueue.c:234
>>>>>>> #3  0x00000000004fa8d8 in process_lingering_close =
(cs=3D0x7fe4a804dd58,
>>>>>>> pfd=3D0x25d3f98) at event.c:1439
>>>>>>>=20
>>>>>>> Details:
>>>>>>> (gdb) print c
>>>>>>> $1 =3D (cleanup_t *) 0x7fe4a804e9f0
>>>>>>> (gdb) print *c
>>>>>>> $2 =3D {next =3D 0x7fe4a804e870, data =3D 0x6d616e79642d3733,
>>>>>>> plain_cleanup_fn =3D 0x392d3734322e6369,
>>>>>>> child_cleanup_fn =3D 0x617465722e722d35}
>>>>>>> (gdb) print *c->data
>>>>>>> Attempt to dereference a generic pointer.
>>>>>>> (gdb) print *c->plain_cleanup_fn
>>>>>>> Cannot access memory at address 0x392d3734322e6369
>>>>>>> (gdb)
>>>>>>>=20
>>>>>>> Stefan
>>>>>>>=20
>>>>>>> Am 21.01.2017 um 15:18 schrieb Stefan Priebe:
>>>>>>>> Hi,
>>>>>>>>=20
>>>>>>>> #0  apr_pool_cleanup_kill (p=3D0x7fe4a8072358,
>>>>>>>> data=3Ddata@entry=3D0x7fe4a80723e0,
>>>>>>>>   cleanup_fn=3Dcleanup_fn@entry=3D0x7fe4bbd38a40 =
<socket_cleanup>) at
>>>>>>>> memory/unix/apr_pools.c:2276
>>>>>>>>=20
>>>>>>>> it crashes here in apr:
>>>>>>>> 2276         if (c->data =3D=3D data && c->plain_cleanup_fn =3D=3D=

>>>>>>>> cleanup_fn) {
>>>>>>>>=20
>>>>>>>> some lines before c becomes this
>>>>>>>> 2264     c =3D p->cleanups;
>>>>>>>>=20
>>>>>>>> p is:
>>>>>>>> (gdb) print *p
>>>>>>>> $1 =3D {parent =3D 0x256f138, child =3D 0x7fe46c0751c8, sibling =
=3D
>>>>>>>> 0x7fe4a8096888, ref =3D 0x7fe4a8069fe8, cleanups =3D =
0x7fe478159748,
>>>>>>>> free_cleanups =3D 0x7fe478159788, allocator =3D 0x7fe4a803b490,
>>>>>>>> subprocesses =3D 0x0, abort_fn =3D 0x43da00 <abort_on_oom>,
>>>>>>>> user_data =3D 0x0, tag =3D 0x502285 "transaction", active =3D
>>>>>>>> 0x7fe478158d70, self =3D 0x7fe4a8072330,
>>>>>>>> self_first_avail =3D 0x7fe4a80723d0 "X#\a\250\344\177", =
pre_cleanups =3D
>>>>>>>> 0x7fe4a8072ab8}
>>>>>>>>=20
>>>>>>>> wouldn't the error mean that p->cleanups is NULL?
>>>>>>>>=20
>>>>>>>> (gdb) print *p->cleanups
>>>>>>>> $2 =3D {next =3D 0x7fe478159628, data =3D 0x7fe478159648,
>>>>>>>> plain_cleanup_fn =3D
>>>>>>>> 0x7fe4bbd2ffd0 <apr_unix_file_cleanup>,
>>>>>>>> child_cleanup_fn =3D 0x7fe4bbd2ff70 =
<apr_unix_child_file_cleanup>}
>>>>>>>>=20
>>>>>>>> So p->cleanups->data is 0x7fe478159648 and data is =
0x7fe4a80723e0?
>>>>>>>>=20
>>>>>>>> I don't get why it's segfaulting
>>>>>>>>=20
>>>>>>>> Stefan
>>>>>>>> Am 21.01.2017 um 09:50 schrieb Yann Ylavic:
>>>>>>>>> Hi Stefan,
>>>>>>>>>=20
>>>>>>>>> On Sat, Jan 21, 2017 at 9:45 AM, Stefan Priebe
>>>>>>>>> <s.priebe@profihost.ag>
>>>>>>>>> wrote:
>>>>>>>>>>=20
>>>>>>>>>> after running the whole night. These are the only ones still
>>>>>>>>>> happening.
>>>>>>>>>> Should i revert the mpm patch to check whether it's the =
source?
>>>>>>>>>=20
>>>>>>>>> Yes please, we need to determine...
>>>>>>>>>=20
>>>>>>>>> Thanks,
>>>>>>>>> Yann.
>>>>>>>>>=20
>>>>>>=20
>>>>>> Stefan Eissing
>>>>>>=20
>>>>>> <green/>bytes GmbH
>>>>>> Hafenstrasse 16
>>>>>> 48155 M=C3=BCnster
>>>>>> www.greenbytes.de
>>>>>>=20

Stefan Eissing

<green/>bytes GmbH
Hafenstrasse 16
48155 M=C3=BCnster
www.greenbytes.de