Return-Path: Delivered-To: apmail-httpd-python-dev-archive@www.apache.org Received: (qmail 3538 invoked from network); 7 Sep 2006 04:59:31 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 7 Sep 2006 04:59:31 -0000 Received: (qmail 16881 invoked by uid 500); 7 Sep 2006 04:59:31 -0000 Delivered-To: apmail-httpd-python-dev-archive@httpd.apache.org Received: (qmail 16752 invoked by uid 500); 7 Sep 2006 04:59:30 -0000 Mailing-List: contact python-dev-help@httpd.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list python-dev@httpd.apache.org Received: (qmail 16741 invoked by uid 99); 7 Sep 2006 04:59:30 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Sep 2006 21:59:30 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [72.36.139.106] (HELO lt01.atlantic-creations.com) (72.36.139.106) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Sep 2006 21:59:28 -0700 Received: (qmail 4453 invoked from network); 7 Sep 2006 00:55:33 -0400 Received: from adsl-227-232-32.mgm.bellsouth.net (HELO ?10.0.0.69?) (74.227.232.32) by 106.139.36.72.reverse.layeredtech.com with (RC4-SHA encrypted) SMTP; 7 Sep 2006 00:55:33 -0400 Mime-Version: 1.0 (Apple Message framework v752.2) To: python-dev@httpd.apache.org Message-Id: <586C10B1-16E2-4F08-A017-D2199E9CCCEC@emedialibrary.org> Content-Type: multipart/signed; protocol="application/pgp-signature"; micalg=pgp-sha1; boundary="Apple-Mail-17-1056800727" From: =?ISO-8859-1?Q?S=E9bastien_Arnaud?= Subject: Regex based publisher proposal Date: Wed, 6 Sep 2006 23:59:03 -0500 Content-Transfer-Encoding: 7bit X-Pgp-Agent: GPGMail 1.1.2 (Tiger) X-Mailer: Apple Mail (2.752.2) X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --Apple-Mail-17-1056800727 Content-Type: multipart/mixed; boundary=Apple-Mail-16-1056800346 --Apple-Mail-16-1056800346 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=ISO-8859-1; delsp=yes; format=flowed Hi, I have been following with passion mod_python development for quite a =20= while now, and in the light of a few emails over the past few months =20 discussing web frameworks in mod_python, I decided I would attempt to =20= contribute to the project in order to move towards a fast, flexible =20 MVC mod_python only based web framework. I have written 2 or 3 different ones along the past couple of years, =20 but nothing worthy of sharing by any mean. They have helped me =20 however to define what would be the "dream" web framework for =20 mod_python, but more importantly to identify the needed plumbing =20 improvements to mod_python. One of the first needed improvements, in my opinion, is the capacity =20 to route web requests in a more flexible manner than via the current =20 publisher module. So, I would like to propose the following module =20 (pubre.py). It is basically a copy of the mod_python.publisher module =20= to the exception that a lot of the core handler code has been =20 modified to use regex in order to route a web request to the =20 appropriate module/function. I have been developing against mod_python/trunk and I attached the =20 file for whoever wants to review it and give it a try. Keep in mind =20 though it is still probably rough around the edges and not any solid =20 testing has been performed yet. I only performed some trivial =20 benchmarking/stress testing to make sure that performance wise it was =20= on par with the current mod_python.publisher. The default behavior is suppose to be 100% compatible with the way =20 mod_python.publisher behaves. Eventually though you would be able to =20 pass as a PythonOption the grammar of the urls in your web =20 application, by simply declaring something like: AddHandler mod_python .py .html PythonHandler mod_python.pubre PythonOption "pubregex" = "(?P[\w]+)?(\.(?P[\w]=20 +))?(/(?P[^/]+))?(\?$)?" I know that not all grammars will work with the current version =20 attached (due to some code being still dependent on the conservative =20 url structure /path/file.ext), eventually though, I hope I can get =20 this solved and allow any regex grammar to work. Anyway, please share your comments and feedback to make sure I am =20 headed in the right direction by keeping in mind that my first goal =20 is to be able to publish using a defined regex url grammar a callable =20= class within a module. I believe that once this first step is =20 accomplished the real design of the web framework can begin. Cheers! S=E9bastien --Apple-Mail-16-1056800346 Content-Transfer-Encoding: 7bit Content-Type: text/x-python-script; x-unix-mode=0644; name=pubre.py Content-Disposition: attachment; filename=pubre.py # # Copyright 2004 Apache Software Foundation # # Licensed under the Apache License, Version 2.0 (the "License"); you # may not use this file except in compliance with the License. You # may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or # implied. See the License for the specific language governing # permissions and limitations under the License. # # Originally developed by Gregory Trubetskoy. # # $Id: publisher.py 384754 2006-03-10 10:20:06Z grahamd $ """ This handler is conceptually similar to Zope's ZPublisher, except that it: 1. Is written specifically for mod_python and is therefore much faster 2. Does not require objects to have a documentation string 3. Passes all arguments as simply string 4. Does not try to match Python errors to HTTP errors 5. Does not give special meaning to '.' and '..'. """ import apache import util import sys import os from os.path import exists, isabs, normpath, split, isfile, join, dirname import imp import re import base64 import new import types from types import * imp_suffixes = " ".join([x[0][1:] for x in imp.get_suffixes()]) ####################### The published page cache ############################## from cache import ModuleCache, NOT_INITIALIZED class PageCache(ModuleCache): """ This is the cache for page objects. Handles the automatic reloading of pages. """ def key(self, req): """ Extracts the normalized filename from the request """ return req.filename def check(self, key, req, entry): config = req.get_config() autoreload=int(config.get("PythonAutoReload", 1)) if autoreload==0 and entry._value is not NOT_INITIALIZED: # if we don't want to reload and we have a value, # then we consider it fresh return None else: return ModuleCache.check(self, key, req, entry) def build(self, key, req, opened, entry): config = req.get_config() log=int(config.get("PythonDebug", 0)) if log: if entry._value is NOT_INITIALIZED: req.log_error('Publisher loading page %s'%req.filename, apache.APLOG_NOTICE) else: req.log_error('Publisher reloading page %s'%req.filename, apache.APLOG_NOTICE) return ModuleCache.build(self, key, req, opened, entry) page_cache = PageCache() ###################### The publisher regex mapper ############################## class Mapper: """ This is the object to cache the regex engine """ regex = "(?P[\w]+)?(\.(?P[\w]+))?(/(?P[^/]+))?(\?$)?" regex_compared = 0 def __init__(self): self.reobj = re.compile(self.regex) def __call__(self, uri, cre): if(cre!=None and not self.regex_compared and cre!=self.regex): self.regex = cre self.reobj = re.compile(self.regex) self.regex_compared = 1 m = self.reobj.match(uri) if m: return (m.group('controller'), m.group('extension'), m.group('action')) else: return (None, None, None) mapper_cache = Mapper() ####################### The publisher handler himself ########################## def handler(req): req.allow_methods(["GET", "POST", "HEAD"]) if req.method not in ["GET", "POST", "HEAD"]: raise apache.SERVER_RETURN, apache.HTTP_METHOD_NOT_ALLOWED path,module_name = os.path.split(req.filename) # Trimming the front part of req.uri if module_name=='': req_url = '' else: req_url = req.uri[req.uri.index(module_name):] # Retrieve custom regex mapping if any # Warning, depending on the custom regex passed along # some of the code in handler might need tweaking # to make sure all is functional (missing . comes to mind) try: custom_regex = req.get_options()["pubregex"] except KeyError: custom_regex = None # Use the mapper_cache obj to determine # the controller, extension and action requested controller, extension, action = mapper_cache(req_url, custom_regex) # Set defaults if None values returned if controller==None: controller = 'index' if extension==None: extension = 'html' if action==None: action = 'index' # Now determine the actual Python module code file # to load. This will first try looking for the file # '/path/.py'. req.filename = path + '/' + controller + '.py' if not exists(req.filename): raise apache.SERVER_RETURN, apache.HTTP_NOT_FOUND # Normalise req.filename to avoid Win32 issues. req.filename = normpath(req.filename) # We use the page cache to load the module module = page_cache[req] # does it have an __auth__? realm, user, passwd = process_auth(req, module) # resolve the object ('traverse') object = resolve_object(req, module, action, realm, user, passwd) # publish the object published = publish_object(req, object) # we log a message if nothing was published, it helps with debugging if (not published) and (req.bytes_sent==0) and (req.next is None): log=int(req.get_config().get("PythonDebug", 0)) if log: req.log_error("mod_python.publisher: nothing to publish.") return apache.OK def process_auth(req, object, realm="unknown", user=None, passwd=None): found_auth, found_access = 0, 0 if hasattr(object, "__auth_realm__"): realm = object.__auth_realm__ func_object = None if type(object) is FunctionType: func_object = object elif type(object) == types.MethodType: func_object = object.im_func if func_object: # functions are a bit tricky func_code = func_object.func_code func_globals = func_object.func_globals if "__auth__" in func_code.co_names: i = list(func_code.co_names).index("__auth__") __auth__ = func_code.co_consts[i+1] if hasattr(__auth__, "co_name"): __auth__ = new.function(__auth__, func_globals) found_auth = 1 if "__access__" in func_code.co_names: # first check the constant names i = list(func_code.co_names).index("__access__") __access__ = func_code.co_consts[i+1] if hasattr(__access__, "co_name"): __access__ = new.function(__access__, func_globals) found_access = 1 if "__auth_realm__" in func_code.co_names: i = list(func_code.co_names).index("__auth_realm__") realm = func_code.co_consts[i+1] else: if hasattr(object, "__auth__"): __auth__ = object.__auth__ found_auth = 1 if hasattr(object, "__access__"): __access__ = object.__access__ found_access = 1 if found_auth or found_access: # because ap_get_basic insists on making sure that AuthName and # AuthType directives are specified and refuses to do anything # otherwise (which is technically speaking a good thing), we # have to do base64 decoding ourselves. # # to avoid needless header parsing, user and password are parsed # once and the are received as arguments if not user and req.headers_in.has_key("Authorization"): try: s = req.headers_in["Authorization"][6:] s = base64.decodestring(s) user, passwd = s.split(":", 1) except: raise apache.SERVER_RETURN, apache.HTTP_BAD_REQUEST if found_auth: if not user: # note that Opera supposedly doesn't like spaces around "=" below s = 'Basic realm="%s"' % realm req.err_headers_out["WWW-Authenticate"] = s raise apache.SERVER_RETURN, apache.HTTP_UNAUTHORIZED if callable(__auth__): rc = __auth__(req, user, passwd) else: if type(__auth__) is DictionaryType: rc = __auth__.has_key(user) and __auth__[user] == passwd else: rc = __auth__ if not rc: s = 'Basic realm = "%s"' % realm req.err_headers_out["WWW-Authenticate"] = s raise apache.SERVER_RETURN, apache.HTTP_UNAUTHORIZED if found_access: if callable(__access__): rc = __access__(req, user) else: if type(__access__) in (ListType, TupleType): rc = user in __access__ else: rc = __access__ if not rc: raise apache.SERVER_RETURN, apache.HTTP_FORBIDDEN return realm, user, passwd ### Those are the traversal and publishing rules ### # tp_rules is a dictionary, indexed by type, with tuple values. # The first item in the tuple is a boolean telling if the object can be traversed (default is True) # The second item in the tuple is a boolen telling if the object can be published (default is True) tp_rules = {} # by default, built-in types cannot be traversed, but can be published default_builtins_tp_rule = (False, True) for t in types.__dict__.values(): if isinstance(t, type): tp_rules[t]=default_builtins_tp_rule # those are the exceptions to the previous rules tp_rules.update({ # Those are not traversable nor publishable ModuleType : (False, False), BuiltinFunctionType : (False, False), # This may change in the near future to (False, True) ClassType : (False, False), TypeType : (False, False), # Publishing a generator may not seem to makes sense, because # it can only be done once. However, we could get a brand new generator # each time a new-style class property is accessed. GeneratorType : (False, True), # Old-style instances are traversable InstanceType : (True, True), }) # types which are not referenced in the tp_rules dictionary will be traversable # AND publishable default_tp_rule = (True, True) def resolve_object(req, obj, object_str, realm=None, user=None, passwd=None): """ This function traverses the objects separated by . (period) to find the last one we're looking for. """ parts = object_str.split('.') first_object = True for obj_str in parts: # path components starting with an underscore are forbidden if obj_str[0]=='_': req.log_error('Cannot traverse %s in %s because ' 'it starts with an underscore' % (obj_str, req.unparsed_uri), apache.APLOG_WARNING) raise apache.SERVER_RETURN, apache.HTTP_FORBIDDEN if first_object: first_object = False else: # if we're not in the first object (which is the module) # we're going to check whether be can traverse this type or not rule = tp_rules.get(type(obj), default_tp_rule) if not rule[0]: req.log_error('Cannot traverse %s in %s because ' '%s is not a traversable object' % (obj_str, req.unparsed_uri, obj), apache.APLOG_WARNING) raise apache.SERVER_RETURN, apache.HTTP_FORBIDDEN # we know it's OK to call getattr # note that getattr can really call some code because # of property objects (or attribute with __get__ special methods)... try: obj = getattr(obj, obj_str) except AttributeError: raise apache.SERVER_RETURN, apache.HTTP_NOT_FOUND # we process the authentication for the object realm, user, passwd = process_auth(req, obj, realm, user, passwd) # we're going to check if the final object is publishable rule = tp_rules.get(type(obj), default_tp_rule) if not rule[1]: req.log_error('Cannot publish %s in %s because ' '%s is not publishable' % (obj_str, req.unparsed_uri, obj), apache.APLOG_WARNING) raise apache.SERVER_RETURN, apache.HTTP_FORBIDDEN return obj # This regular expression is used to test for the presence of an HTML header # tag, written in upper or lower case. re_html = re.compile(r"\s*$",re.I) re_charset = re.compile(r"charset\s*=\s*([^\s;]+)",re.I); def publish_object(req, object): if callable(object): # To publish callables, we call them an recursively publish the result # of the call (as done by util.apply_fs_data) req.form = util.FieldStorage(req, keep_blank_values=1) return publish_object(req,util.apply_fs_data(object, req.form, req=req)) # TODO : we removed this as of mod_python 3.2, let's see if we can put it back # in mod_python 3.3 # elif hasattr(object,'__iter__'): # # # To publish iterables, we recursively publish each item # # This way, generators can be published # result = False # for item in object: # result |= publish_object(req,item) # return result # else: if object is None: # Nothing to publish return False elif isinstance(object,UnicodeType): # We've got an Unicode string to publish, so we have to encode # it to bytes. We try to detect the character encoding # from the Content-Type header if req._content_type_set: charset = re_charset.search(req.content_type) if charset: charset = charset.group(1) else: # If no character encoding was set, we use UTF8 charset = 'UTF8' req.content_type += '; charset=UTF8' else: # If no character encoding was set, we use UTF8 charset = 'UTF8' result = object.encode(charset) else: charset = None result = str(object) if not req._content_type_set: # make an attempt to guess content-type # we look for a