Return-Path: Mailing-List: contact python-dev-help@httpd.apache.org; run by ezmlm Delivered-To: mailing list python-dev@httpd.apache.org Received: (qmail 40561 invoked by uid 99); 8 Jun 2005 01:11:44 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from user.openhosting.com (HELO dscpl.user.openhosting.com) (207.126.122.36) by apache.org (qpsmtpd/0.28) with ESMTP; Tue, 07 Jun 2005 18:11:43 -0700 Received: from [10.0.1.2] (ppp12-230.lns2.syd3.internode.on.net [59.167.12.230]) by dscpl.user.openhosting.com (8.12.11/8.12.11) with ESMTP id j581BX24016667; Tue, 7 Jun 2005 21:11:36 -0400 In-Reply-To: References: Mime-Version: 1.0 (Apple Message framework v619.2) Content-Type: text/plain; charset=US-ASCII; format=flowed Message-Id: <4b1e18d795947d5230d91bdd753c6f42@dscpl.com.au> Content-Transfer-Encoding: 7bit Cc: python-dev@httpd.apache.org From: Graham Dumpleton Subject: Re: Solving the import problem Date: Wed, 8 Jun 2005 11:11:31 +1000 To: nicolas@lehuen.com X-Mailer: Apple Mail (2.619.2) X-Virus-Checked: Checked An update on a few things that I have managed to get working in Vampire in respect of some of the issues below, plus a few other comments. On 08/06/2005, at 6:33 AM, Nicolas Lehuen wrote: > One last thing that we should prepare is a clear and definite answer > to the zillion users who need to import a custom utility module. > Today, we have 4 ways of importing code : > > a) the standard "import" keyword. Today, it works unchanged > (mod_python doesn't install any import hook). The consequence is that > the only modules that can be imported this way are those found on the > PYTHONPATH. Importing custom code is easy if you can manipulate this > variable (either directly or through the PythonPath configuration > directive), but not everybody has this luxury (think shared hosting, > although not being able to change the PythonPath through an .htaccess > file seems pretty restrictive to me.). I finally worked out the proper way in Python that one is meant to install import hooks so that you don't screw up other packages also trying to use import hooks, although it relies on the other packages doing it the correct way as well. The result is that in Vampire, when the feature is enabled, you can use the "import" keyword to import modules local to the document tree where the handler is and it will use the Vampire module importing system instead for those imports. Where the context is traceable back to a top level import of a handler from Vampire, the automatic module reloading mechanism, including changes in children causing parents to be reloaded, is all working okay. When this feature kicks in, it will only search in the same directory as handler file is located and optionally along a module search path which is distinct from the normal sys.path. This search path has to be separate and can't overlap with sys.path because you will end up with duplicate modules loaded in different ways if one isn't careful. The preferred approach is that sys.path should simply not include any directory which is a part of the document tree. The only part of what "import" provides that isn't working completely yet is importation of packages. The bits of this that do work are the importing of the root of the package. Importing of a sub module/package of the package which was already imported by the parent and using the from/import syntax to import only bits of any of these. The one bit that I haven't been able to get working yet is where you have "import package.module" and where "module" wasn't explicitly imported by "package/__init__.py". The reason it doesn't work is that the part of the Python import system that deals with packages assumes that any module imports are always stored in sys.modules. It relies on this and will search sys.modules for the parent module to determine which directory it is in and thus from where it should import the sub module/package. At the moment to me this makes is look like any system that tries to use import hooks in Python, cannot support packages where the modules/packages are not stored in sys.modules. Because of this, even though packages partly work, at the moment I throw an import error with a message saying that packages aren't supported in the context of the Vampire module importing system if such an import is attempted. This shouldn't be an issue for individual handler files stored in the document tree as you wouldn't write them as packages normally anyway. It might be an issue if someone had a set of utility modules living outside the document tree that they wanted automatic reloading to work on. The only choice there at the moment is not to use a traditional package in that context. You could get more flexibility by accessing the module loading API in Vampire directly, but that means the utility modules, that perhaps shouldn't strictly know about Vampire/mod_python, will. > b) the PythonImport directive, which ensure that a module is imported > (hence its initialization code is ran), but doesn't really import it > into the handler's or published module's namespace. > > c) the apache.import_module() function, which is frankly a strange > beast. It knows how to approximately reload some modules, but has many > tricks that makes it frankly dangerous, namely the one that causes > modules with the same name but residing in different directories to > collide. I really think that mixing dynamic (re)loading of code with > the usual import mechanisms is a recipe for disaster. Anyway, today, > it's the only way our users can import some shared code, using a > strange idiom like > apache.import_module('mymodule',[dirname(__file__)]). I know you have marked: http://issues.apache.org/jira/browse/MODPYTHON-9 as resolved by virtue of including a new module importing system in publisher, but there is still the underlying problem in import_module() function that once you access an "index.py" in a subdirectory, the one in the parent is effectively lost. I realise that even if this is fixed, each still gets reloaded on cyclic requests, but at least the parent doesn't become completely useless. > d) the new publisher.get_page(req,path), which is not really an answer > since it is designed to allow a published object to call another > published object from another page (not to call some shared code). > > This mess should be sorted out. As a baseline, I'd say that we have 4 > kinds of code in mod_python : Brain slowing down at this point. I'll perhaps come back with some more coherent thoughts on the rest of your points later when I have got some other things out of the way. :-) > 1) the standard Python code that should be imported using the "import" > keyword > > 2) handlers, which are dynamically loaded through apache.import_module > (so they are declared in sys.module, with all the problem that can > cause when sharing a single setup with multiple handlers that have the > same name, "publisher" for example) - this should be fixed. > > 3) published modules, which are dynamically loaded by the > mod_python.publisher handler (so now they don't have any problems that > were previously caused by apache.import_module). An important thing to > notice is that published module are usually stored in a directory > which is visible by Apache (handlers don't need to reside in a public > directory), amongst .html and image files. Hence, people can > legitimately be reluctant to put their core application code > (including DB passwords etc.) in published modules, for security and > code/presentation separation issues. > > 4) custom library code, AKA core application code. This code should > reside somewhere, preferably in a private directory (at least direct > access to this code from the web should be denied) and be easily > imported and reloaded into published modules, without having to tinker > too much with the PYTHONPATH variable or the PythonPath directive. > > What would be nice is a clear and definite way to handle those 4 kinds > of code. To me, layers 2, 3 and 4 could be handled by the same dynamic > code cache, except that a careful directory structure or naming scheme > would prevent the layer 4 to be visible from the web. > > I know Vampire solves a lot of these problems, so we have two > alternatives : > > A) We decide that we won't solve the whole problem into mod_python. We > take apache.import_module out and shoot it. Handlers are loaded in a > real dynamic code cache maybe the same as the one now used by > mod_python.publisher), which solves a lot of problems. > > Custom library code is not handled : if you want to import some code, > you put it wherever you like and make sure PYTHONPATH or the > PythonPath directive point to it, so you can import it like a standard > module. You'll never use apache.import_module anymore, it will > blissfully dissolve into oblivion (and be removed from the module, > anyway). > > If you need to reload your core application code without restarting > Apache, then too bad, mod_python doesn't know how to do this. Check > out Vampire. > > B) We decide to solve the whole problem into mod_python. > apache.import_module is not much luckier this time, it is still taken > out and shot in the head. We solve the handlers loading problem. But > now, with a little help from Graham, custom application code can be > dynamically loaded and reloaded from any place without having to > tinker with the PYTHONPATH variable and/or the PythonPath directive. > Everything can be done from the source code with a little help from an > .htaccess file. > > So, sorry for this long mail, but I had to get this out. The current > situation is pretty bad, zillions of people need to do this simple > thing, and when they notice it's not that simple (or it's buggy), they > decide to build the nth application framework on mod_python. So, > either we reckon it's None of our business, that users should turn to > higher level frameworks like Vampire, and we remove > apache.import_module, or we decide to tackle the issue, and we remove > apache.import_module. Either way, it must leave :). > > What do you think ? > > Regards, > Nicolas