Return-Path: Delivered-To: apmail-apr-dev-archive@www.apache.org Received: (qmail 37913 invoked from network); 20 Nov 2007 08:37:55 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 20 Nov 2007 08:37:55 -0000 Received: (qmail 92847 invoked by uid 500); 20 Nov 2007 08:37:42 -0000 Delivered-To: apmail-apr-dev-archive@apr.apache.org Received: (qmail 92470 invoked by uid 500); 20 Nov 2007 08:37:41 -0000 Mailing-List: contact dev-help@apr.apache.org; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Id: Delivered-To: mailing list dev@apr.apache.org Received: (qmail 92459 invoked by uid 99); 20 Nov 2007 08:37:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Nov 2007 00:37:41 -0800 X-ASF-Spam-Status: No, hits=0.2 required=10.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [213.191.128.81] (HELO mxout2.iskon.hr) (213.191.128.81) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 20 Nov 2007 08:37:43 +0000 Received: (qmail 6966 invoked from network); 20 Nov 2007 09:37:22 +0100 X-Remote-IP: 213.191.142.122 Received: from unknown (HELO mx.iskon.hr) (213.191.142.122) by mxout2.iskon.hr with SMTP; 20 Nov 2007 09:37:22 +0100 Received: (qmail 14313 invoked from network); 20 Nov 2007 09:37:22 +0100 X-AVScan: ClamAV X-Remote-IP: 89.164.16.105 Received: from 16-105.dsl.iskon.hr (HELO ?192.168.0.168?) (89.164.16.105) by mx.iskon.hr with SMTP; 20 Nov 2007 09:37:21 +0100 Message-ID: <47429CC1.4090909@apache.org> Date: Tue, 20 Nov 2007 09:37:21 +0100 From: Mladen Turk User-Agent: Mozilla MIME-Version: 1.0 To: APR Developer List Subject: Problems with apr_pool_clear/destroy Content-Type: text/plain; charset=ISO-8859-2; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi, IMHO there's one serious problem with the way how we currently clear or destroy the pool regarding cleanup callbacks. Like always those problems are visible in threaded applications only ;) The core of the problem is the fact that pool's child pools are destroyed before the pool's cleanups have been run. apr_pool_destroy/clear { for each child in child_pools run apr_pool_destroy child run our own cleanups run our process cleanups ... } This algorithm makes few things impossible causing core dumps. 1. If we are in the blocking APR function (apr_socket_accept for example) with our own pool that is child of the pool that gets clear/destroy, we cannot trust our own local data still points to the valid memory allocated from our child pool after the accepts gets broken by the parent pool destroy. 2. If we have multiple threads spawning processes from their own pool we cannot register cleanup_for_exec to the parent pool, cause it will core dump in free_proc_chain. Again if the process was created from the child pool in free_proc_chain it will reference deallocated memory. The only solution is to register the cleanup_for_exec in child pool, and that leads to multiple 3 seconds delays (one that was supposed the cleanup_for_exec should deal with). My proposal is that we change the way how apr_pool_clear/destroy operates. apr_pool_destroy/clear { for each child in child_pools run run child cleanups recursively run our own cleanups for each child in child_pools run run child process cleanups recursively run our process cleanups for each child in child_pools run apr_pool_destroy child ... } This will cause that cleanups are run for all child pools and their child pools in a pool chain without deallocating memory. After all the cleanups have been run the memory will get destroyed in the same order. We can even add two new functions apr_pool_run_cleanups(apr_pool_t *) and apr_pool_run_cleanup_for_exec(apr_pool_t *) that will recursively run all the cleanups and remove them from the cleanup chains. After that call, a multithreaded app can can call an join and then safely call the clear/destroy and be assured that all blocking calls have been exited. Is that make any sense? Regards, Mladen