Mailing-List: contact python-dev-help@httpd.apache.org; run by ezmlm
Received-SPF: pass (hermes.apache.org: local policy)
In-Reply-To: <c298f2d705060713334fd4ad55@mail.gmail.com>
References: <c298f2d705060713334fd4ad55@mail.gmail.com>
Mime-Version: 1.0 (Apple Message framework v619.2)
Content-Type: text/plain; charset=US-ASCII; format=flowed
Message-Id: <4b1e18d795947d5230d91bdd753c6f42@dscpl.com.au>
Content-Transfer-Encoding: 7bit
Cc: python-dev@httpd.apache.org
From: Graham Dumpleton <grahamd@dscpl.com.au>
Subject: Re: Solving the import problem
Date: Wed, 8 Jun 2005 11:11:31 +1000
To: nicolas@lehuen.com

An update on a few things that I have managed to get working in
Vampire in respect of some of the issues below, plus a few other
comments.

On 08/06/2005, at 6:33 AM, Nicolas Lehuen wrote:

> One last thing that we should prepare is a clear and definite answer
> to the zillion users who need to import a custom utility module.
> Today, we have 4 ways of importing code :
>
> a) the standard "import" keyword. Today, it works unchanged
> (mod_python doesn't install any import hook). The consequence is that
> the only modules that can be imported this way are those found on the
> PYTHONPATH. Importing custom code is easy if you can manipulate this
> variable (either directly or through the PythonPath configuration
> directive), but not everybody has this luxury (think shared hosting,
> although not being able to change the PythonPath through an .htaccess
> file seems pretty restrictive to me.).

I finally worked out the proper way in Python that one is meant to
install import hooks so that you don't screw up other packages also
trying to use import hooks, although it relies on the other packages
doing it the correct way as well.

The result is that in Vampire, when the feature is enabled, you can
use the "import" keyword to import modules local to the document tree
where the handler is and it will use the Vampire module importing
system instead for those imports. Where the context is traceable back
to a top level import of a handler from Vampire, the automatic module
reloading mechanism, including changes in children causing parents
to be reloaded, is all working okay.

When this feature kicks in, it will only search in the same directory
as handler file is located and optionally along a module search path
which is distinct from the normal sys.path. This search path has to
be separate and can't overlap with sys.path because you will end up
with duplicate modules loaded in different ways if one isn't careful.
The preferred approach is that sys.path should simply not include any
directory which is a part of the document tree.

The only part of what "import" provides that isn't working completely
yet is importation of packages. The bits of this that do work are the
importing of the root of the package. Importing of a sub module/package
of the package which was already imported by the parent and using the
from/import syntax to import only bits of any of these.

The one bit that I haven't been able to get working yet is where you
have "import package.module" and where "module" wasn't explicitly
imported by "package/__init__.py".

The reason it doesn't work is that the part of the Python import system
that deals with packages assumes that any module imports are always 
stored
in sys.modules. It relies on this and will search sys.modules for the
parent module to determine which directory it is in and thus from where
it should import the sub module/package.

At the moment to me this makes is look like any system that tries to use
import hooks in Python, cannot support packages where the 
modules/packages
are not stored in sys.modules.

Because of this, even though packages partly work, at the moment I throw
an import error with a message saying that packages aren't supported in
the context of the Vampire module importing system if such an import is
attempted. This shouldn't be an issue for individual handler files 
stored
in the document tree as you wouldn't write them as packages normally 
anyway.
It might be an issue if someone had a set of utility modules living 
outside
the document tree that they wanted automatic reloading to work on. The
only choice there at the moment is not to use a traditional package in
that context. You could get more flexibility by accessing the module
loading API in Vampire directly, but that means the utility modules, 
that
perhaps shouldn't strictly know about Vampire/mod_python, will.

> b) the PythonImport directive, which ensure that a module is imported
> (hence its initialization code is ran), but doesn't really import it
> into the handler's or published module's namespace.
>
> c) the apache.import_module() function, which is frankly a strange
> beast. It knows how to approximately reload some modules, but has many
> tricks that makes it frankly dangerous, namely the one that causes
> modules with the same name but residing in different directories to
> collide. I really think that mixing dynamic (re)loading of code with
> the usual import mechanisms is a recipe for disaster. Anyway, today,
> it's the only way our users can import some shared code, using a
> strange idiom like
> apache.import_module('mymodule',[dirname(__file__)]).

I know you have marked:

   http://issues.apache.org/jira/browse/MODPYTHON-9

as resolved by virtue of including a new module importing system in
publisher, but there is still the underlying problem in import_module()
function that once you access an "index.py" in a subdirectory, the one
in the parent is effectively lost. I realise that even if this is fixed,
each still gets reloaded on cyclic requests, but at least the parent
doesn't become completely useless.

> d) the new publisher.get_page(req,path), which is not really an answer
> since it is designed to allow a published object to call another
> published object from another page (not to call some shared code).
>
> This mess should be sorted out. As a baseline, I'd say that we have 4
> kinds of code in mod_python :

Brain slowing down at this point. I'll perhaps come back with some more
coherent thoughts on the rest of your points later when I have got some
other things out of the way. :-)

> 1) the standard Python code that should be imported using the "import" 
> keyword
>
> 2) handlers, which are dynamically loaded through apache.import_module
> (so they are declared in sys.module, with all the problem that can
> cause when sharing a single setup with multiple handlers that have the
> same name, "publisher" for example) - this should be fixed.
>
> 3) published modules, which are dynamically loaded by the
> mod_python.publisher handler (so now they don't have any problems that
> were previously caused by apache.import_module). An important thing to
> notice is that published module are usually stored in a directory
> which is visible by Apache (handlers don't need to reside in a public
> directory), amongst .html and image files. Hence, people can
> legitimately be reluctant to put their core application code
> (including DB passwords etc.) in published modules, for security and
> code/presentation separation issues.
>
> 4) custom library code, AKA core application code. This code should
> reside somewhere, preferably in a private directory (at least direct
> access to this code from the web should be denied) and be easily
> imported and reloaded into published modules, without having to tinker
> too much with the PYTHONPATH variable or the PythonPath directive.
>
> What would be nice is a clear and definite way to handle those 4 kinds
> of code. To me, layers 2, 3 and 4 could be handled by the same dynamic
> code cache, except that a careful directory structure or naming scheme
> would prevent the layer 4 to be visible from the web.
>
> I know Vampire solves a lot of these problems, so we have two 
> alternatives :
>
> A) We decide that we won't solve the whole problem into mod_python. We
> take apache.import_module out and shoot it. Handlers are loaded in a
> real dynamic code cache maybe the same as the one now used by
> mod_python.publisher), which solves a lot of problems.
>
> Custom library code is not handled : if you want to import some code,
> you put it wherever you like and make sure PYTHONPATH or the
> PythonPath directive point to it, so you can import it like a standard
> module. You'll never use apache.import_module anymore, it will
> blissfully dissolve into oblivion (and be removed from the module,
> anyway).
>
> If you need to reload your core application code without restarting
> Apache, then too bad, mod_python doesn't know how to do this. Check
> out Vampire.
>
> B) We decide to solve the whole problem into mod_python.
> apache.import_module is not much luckier this time, it is still taken
> out and shot in the head. We solve the handlers loading problem. But
> now, with a little help from Graham, custom application code can be
> dynamically loaded and reloaded from any place without having to
> tinker with the PYTHONPATH variable and/or the PythonPath directive.
> Everything can be done from the source code with a little help from an
> .htaccess file.
>
> So, sorry for this long mail, but I had to get this out. The current
> situation is pretty bad, zillions of people need to do this simple
> thing, and when they notice it's not that simple (or it's buggy), they
> decide to build the nth application framework on mod_python. So,
> either we reckon it's None of our business, that users should turn to
> higher level frameworks like Vampire, and we remove
> apache.import_module, or we decide to tackle the issue, and we remove
> apache.import_module. Either way, it must leave :).
>
> What do you think ?
>
> Regards,
> Nicolas