Return-Path: Delivered-To: apmail-httpd-dev-archive@www.apache.org Received: (qmail 89628 invoked from network); 17 Feb 2009 10:01:19 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 17 Feb 2009 10:01:19 -0000 Received: (qmail 16548 invoked by uid 500); 17 Feb 2009 10:01:12 -0000 Delivered-To: apmail-httpd-dev-archive@httpd.apache.org Received: (qmail 16475 invoked by uid 500); 17 Feb 2009 10:01:12 -0000 Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm Precedence: bulk Reply-To: dev@httpd.apache.org list-help: list-unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@httpd.apache.org Received: (qmail 16466 invoked by uid 99); 17 Feb 2009 10:01:12 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Feb 2009 02:01:12 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of graham.dumpleton@gmail.com designates 74.125.44.153 as permitted sender) Received: from [74.125.44.153] (HELO yx-out-1718.google.com) (74.125.44.153) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Feb 2009 10:01:04 +0000 Received: by yx-out-1718.google.com with SMTP id 3so1178967yxi.84 for ; Tue, 17 Feb 2009 02:00:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=ptDSA6Ar4SXV53QjPAI6LiCS32/kpq8oW2EyiXSmE1g=; b=YOx2lVqPGRwk/1bL9mS7iGryX1hKxywBp0X/YD1IoQPtHZmK+qLD8pY8e9CwBWavcd i2dBuXWY4za0MYKxjI10w2w8R/9e/4FII2bxEwfLgcRKIhpnEGisH6ZNFhqQoHp69PGx fuo1HQbmhAdqTodgCQZZq7MPv8hBNLRtYUl4I= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=sewZx2CxyptsTuoYylcP2HwHfRTx97hgjoehVJx3gGmYk96HfNrS9dC8K1cG0C9pY8 LvXSRhtIlp2VrkwBQnVIiAORE/7Z17wJoC9f2O0gk8QVithtOGyharMWBAJ3X7fbU0f0 bP1Os9pid7BggE/zpPCr54AdyQXXj7EvLAs8Q= MIME-Version: 1.0 Received: by 10.151.46.3 with SMTP id y3mr5290361ybj.220.1234864843101; Tue, 17 Feb 2009 02:00:43 -0800 (PST) In-Reply-To: <499A68A4.10405@apache.org> References: <88e286470902131525v57cfaab1s58f84c0f76cf264a@mail.gmail.com> <20090216100746.GA5873@redhat.com> <88e286470902160352r13a1cdaid5779f7eb833f68d@mail.gmail.com> <20090216132140.GB9041@redhat.com> <88e286470902162225l3238d42agf4dff89dbd367b0f@mail.gmail.com> <499A68A4.10405@apache.org> Date: Tue, 17 Feb 2009 21:00:43 +1100 Message-ID: <88e286470902170200y4413b8c5s78e68a9bc03c7bb@mail.gmail.com> Subject: Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets. From: Graham Dumpleton To: dev@httpd.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org 2009/2/17 Mladen Turk : > Graham Dumpleton wrote: >> >> 2009/2/17 Joe Orton : >>>> >>>> I did used to perform a dup, but was told that this would cause >>>> problems with file locking. Specifically was told: > >>> I'm getting lost here. What has file locking got to do with it? Does >>> mod_wscgi rely on file locking somehow? >> > > I'm lost as well :) Consider: fd1 = .... lock(fd1) fd2 = dup(fd1) close(fd2) # will release the lock under some lock APIs even though not last reference to underlying file object write(fd1) # lock has already been released so not gauranteed that only writer close(fd1) At least that is how I understand it from what is being explained to me and pointed out in various documentation. So, if fd2 is the file descriptor created for file bucket in Apache, if it gets closed before application later wants to write to file through fd1, then application has lost its exclusive ownership acquired by way of the lock and something else could have acquired lock and started modifying it on basis that it has exclusive onwership at that time. >> In WSGI applications, it is possible for the higher level Python web >> application to pass back a file object reference for the response with >> the intent that the WSGI adapter use any optimised methods available >> for sending it back as response. This is where file buckets come into >> the picture to begin with. > > Now it looks that you are trying to intermix the third party > maintained native OS file descriptors and file buckets. > You can create the apr_file_t from apr_os_file_t Which is what it does. Simplified code below: apr_os_file_t fd = -1; apr_file_t *tmpfile = NULL; fd = PyObject_AsFileDescriptor(filelike); apr_os_file_put(&tmpfile, &fd, APR_SENDFILE_ENABLED, self->r->pool); > (Think you'll have platform portability issues there) The optimisation is only supported on UNIX systems. > but the major problem would be to ensure the life cycle > of the object, since Python has it's own GC and httpd has > it's pool. > IMHO you will need a new apr_bucket provider written in > Python and C for something like that. CPython uses reference counting. What is referred to as GC in Python is actually just a mechanism that kicks in under certain circumstances to break cycles between reference counted objects. Having a special bucket type which holds a reference to the Python file object will not help anyway. This is because the close() method of the Python file object can be called prior to the file bucket being destroyed. This closing of the Python file object would occur before the delayed write of file bucket resulting due to the EOS optimisation. So, same problem as when using naked file descriptor. Also, using a special bucket type opens another can of works. This is because multiple interpreters are supported as well as multithreading. Thus it would be necessary to track the named interpreter in use within the bucket and have to reaquire the lock on the interpreter being used and ensure thread state is correctly reinstated. Although possible to do, it gets a bit messy. Holding onto the file descriptor to allow the optimisation isn't really desirable for other reasons as well. This is because the WSGI specification effectively requires the response content to have been flushed out to the client before the final call back into the application to clean up things. In the final call back into the application to perform cleanup and close stuff like files, it could technically rewrite the content of the file. If Apache has not finished writing out the contents of the file, presuming the Python file object hadn't been closed, then Apache would end up writing different content to what was expected and possibly truncated content if file resized. Summary, you need to have a way of knowing that when you flush something that it really has been flushed and that Apache is all done with it. Graham