Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@httpd.apache.org
Received-SPF: pass (nike.apache.org: domain of graham.dumpleton@gmail.com
 designates 74.125.44.153 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type:content-transfer-encoding;
        b=sewZx2CxyptsTuoYylcP2HwHfRTx97hgjoehVJx3gGmYk96HfNrS9dC8K1cG0C9pY8
         LvXSRhtIlp2VrkwBQnVIiAORE/7Z17wJoC9f2O0gk8QVithtOGyharMWBAJ3X7fbU0f0
         bP1Os9pid7BggE/zpPCr54AdyQXXj7EvLAs8Q=
MIME-Version: 1.0
In-Reply-To: <499A68A4.10405@apache.org>
References: <88e286470902131525v57cfaab1s58f84c0f76cf264a@mail.gmail.com>
	 <20090216100746.GA5873@redhat.com>
	 <88e286470902160352r13a1cdaid5779f7eb833f68d@mail.gmail.com>
	 <20090216132140.GB9041@redhat.com>
	 <88e286470902162225l3238d42agf4dff89dbd367b0f@mail.gmail.com>
	 <499A68A4.10405@apache.org>
Date: Tue, 17 Feb 2009 21:00:43 +1100
Message-ID: <88e286470902170200y4413b8c5s78e68a9bc03c7bb@mail.gmail.com>
Subject: Re: Problems with EOS optimisation in ap_core_output_filter() and
	file buckets.
From: Graham Dumpleton <graham.dumpleton@gmail.com>
To: dev@httpd.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

2009/2/17 Mladen Turk <mturk@apache.org>:
> Graham Dumpleton wrote:
>>
>> 2009/2/17 Joe Orton <jorton@redhat.com>:
>>>>
>>>> I did used to perform a dup, but was told that this would cause
>>>> problems with file locking. Specifically was told:
>
>>> I'm getting lost here.  What has file locking got to do with it?  Does
>>> mod_wscgi rely on file locking somehow?
>>
>
> I'm lost as well :)

Consider:

  fd1 = ....

  lock(fd1)

  fd2 = dup(fd1)

  close(fd2) # will release the lock under some lock APIs even though
not last reference to underlying file object

  write(fd1) # lock has already been released so not gauranteed that only writer

  close(fd1)

At least that is how I understand it from what is being explained to
me and pointed out in various documentation.

So, if fd2 is the file descriptor created for file bucket in Apache,
if it gets closed before application later wants to write to file
through fd1, then application has lost its exclusive ownership
acquired by way of the lock and something else could have acquired
lock and started modifying it on basis that it has exclusive onwership
at that time.

>> In WSGI applications, it is possible for the higher level Python web
>> application to pass back a file object reference for the response with
>> the intent that the WSGI adapter use any optimised methods available
>> for sending it back as response. This is where file buckets come into
>> the picture to begin with.
>
> Now it looks that you are trying to intermix the third party
> maintained native OS file descriptors and file buckets.
> You can create the apr_file_t from apr_os_file_t

Which is what it does. Simplified code below:

  apr_os_file_t fd = -1;
  apr_file_t *tmpfile = NULL;

  fd = PyObject_AsFileDescriptor(filelike);

  apr_os_file_put(&tmpfile, &fd, APR_SENDFILE_ENABLED, self->r->pool);

> (Think you'll have platform portability issues there)

The optimisation is only supported on UNIX systems.

> but the major problem would be to ensure the life cycle
> of the object, since Python has it's own GC and httpd has
> it's pool.
> IMHO you will need a new apr_bucket provider written in
> Python and C for something like that.

CPython uses reference counting. What is referred to as GC in Python
is actually just a mechanism that kicks in under certain circumstances
to break cycles between reference counted objects.

Having a special bucket type which holds a reference to the Python
file object will not help anyway. This is because the close() method
of the Python file object can be called prior to the file bucket being
destroyed. This closing of the Python file object would occur before
the delayed write of file bucket resulting due to the EOS
optimisation. So, same problem as when using naked file descriptor.

Also, using a special bucket type opens another can of works. This is
because multiple interpreters are supported as well as multithreading.
Thus it would be necessary to track the named interpreter in use
within the bucket and have to reaquire the lock on the interpreter
being used and ensure thread state is correctly reinstated. Although
possible to do, it gets a bit messy.

Holding onto the file descriptor to allow the optimisation isn't
really desirable for other reasons as well. This is because the WSGI
specification effectively requires the response content to have been
flushed out to the client before the final call back into the
application to clean up things. In the final call back into the
application to perform cleanup and close stuff like files, it could
technically rewrite the content of the file. If Apache has not
finished writing out the contents of the file, presuming the Python
file object hadn't been closed, then Apache would end up writing
different content to what was expected and possibly truncated content
if file resized.

Summary, you need to have a way of knowing that when you flush
something that it really has been flushed and that Apache is all done
with it.

Graham