perl-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stas Bekman <>
Subject Re: RFC: Apache::Registry family re-design
Date Sat, 08 Sep 2001 17:19:01 GMT
On Sat, 8 Sep 2001, Barrie Slaymaker wrote:

> On Sat, Sep 08, 2001 at 05:41:12PM +0800, Stas Bekman wrote:
> > On Fri, 7 Sep 2001, Barrie Slaymaker wrote:
> >
> > > On Fri, Sep 07, 2001 at 04:56:30PM +0800, Stas Bekman wrote:
> >
> > It won't be too confusing for most of the people since they will just use
> > Apache::Registry or Apache::PerlRun as before, without even knowing that
> > these are not real modules anymore.
> Figured such "passive use" would be backward compatible, the custom
> directives should allow for slight tweaking without having to think of
> cooking or subclassing.
> I'm still pondering whether the complexity of cooking is worth the
> minor speed improvement; it's a really elegant approach to
> hyper-optimizing, but the few extra conditionals that cooking
> optimizes away will be extremely minor compared with both (A) the
> speedup of using Apache::Registry in the first place and (B) the speed
> of the script.  I think the performance benefits will be swamped by
> the other parts of the system.  But if it's really simple to
> implement, I see no problem with it.  It is a neat technical approach.

Well, we will do it because we can and it doesn't take much of an effort,
just like you said. And the best thing is that it doesn't cast anything in
stone, if at some point we find some better way or somebody just want to
go the academic way and make it OO and do all the inheritance staff, you
still can.

The point is to write a library of standalone functions implemented in
Perl and C, and let the final products be Lego-like, just pick what you
want and if you want to have 10 different Registry modules, doing similar
but yet different things, you can do it with a little effort.

As for performance, I do want to have the Registry stuff as fast as
possible. Even if the performance gains are very small. Again we can
afford doing that, without spending too much time, because we can. So I
see no reason why we shouldn't do it.

Rewriting some of the Registry components in XS/C would be definitely a
good thing to do, but this can always be done later. Especially since Doug
has already written Apache::PerlRunXS, which I even didn't know about. Of
course it'll have to be adjusted for apache 2.0 and possible threads
issues, but it's a good start.

> > > For instance, setting root of the namespace the script/handler is
> > > compiled in to and the algorithm used to generate the relative part of
> > > the namespace should be able to be set by a parameter.  That might be
> > > configured using perlish interpolation:
> > >
> > >     PerlRegistryNameSpace MyRegistryRoot::${filename}
> > >
> > > where $filename/$inode/$uri/$vh is escaped to be a nice package name.
> > > This leads to the natural
> > >
> > >     PerlRegistryNameSpace MyRegistryRoot::${vh}::${filename}
> > >
> > > and the slightly unnatural:
> > >
> > >     PerlRegistryNameSpace MyRegistryRoot::&{My::key_generating_sub}
> > >
> > > or:
> > >
> > >     PerlRegistryNameSpace &{My::key_generating_sub}
> > >
> > > to allow some subroutine to be called to generate the name.
> >
> > I think that all directives in httpd.conf or cook() should accept a
> > reference to a sub, so you can write any sub anywhere and use it. One of
> > the most important benefits of it is using NOPs.
> >
> > package Foo;
> > use constant NOP => sub {};
> >
> > now is it really getting optimized away at compile time?
> Depends on how you use it.  Do you intend to do something like having
> PerlRegistryDoStat Foo::NOP cause a cooked package to do something like
>    *do_stat = *Foo::NOP ;
>    ...some lines later...
>    $mtime = (do_stat($filename))[9];


> ?  I think that'll barf due to passing in a parameter to a sub with a ()
> prototype:
>    $ perl -we "sub NOP(){};NOP('a')"
>    Too many arguments for main::NOP at -e line 1, at end of line
>    Execution of -e aborted due to compilation errors.

Who said that we have to use prototypes? Or if we don't the sub call won't
be optimized away?

$ perl -lwe "sub NOP{};NOP('a')"

is there some doc explaining how things getting optimized away, other than
the source code?

> Perhaps better for httpd.conf readability to use a special value of
> "Off" to force a NOP, "On" for default operation, and "SubName" to cook
> code like:
>    use constant get_mtime => "" ;
>    ...some lines later....
>    my $mtime ;
>    $mtime = lc get_mtime eq "On" ? (stat $fn)[9] : get_mtime->()
>       if get_mtime && lc get_mtime ne "off" ;
> That rather complex expression should optimize away nicely for /off/i ;
> Or did I misunderstand your intent?

Yes, you understood it correctly. In fact I didn't yet think about the
details. I suppose we can always use approach and build the code
via Autoload. then we can really optimize things ourselves, by writing
only the code that we need. This will make the code writing harder, but
it's possible. The real problem will be with XS versions of the subs,
where it simply won't work, unless we use lots of #IFDEFS, I guess that's
the only way to go there.

> > Or is there some better way to make NOP CODE reference?
> I don't think there's a really good way to do a NOP code ref.  Of course
> you could pass the filename in a global to avoid the prototype vs. arg.
> passing issue, but that's a tad messy.

Nuh, we should stay away from using globals. Hmm, is it possible to write
the NOP in XS? it still has to be CODE reference. Any XS gurus?

> > > It'd be nice if the new Apache::Registry could use or just cooperate
> > > with Apache::Reload for autostat/autoreload.
> >
> > That won't really work. Because the files on the disk aren't the same as
> > registry packages, unless Apache::Reload will try to use a known interface
> > to package_reload in the registry space, but I see no why how
> > Apache::Reload will figure out what registry package in %INC belongs to
> > what Apache::RegistryFoo module.
> Not sure it will out of the gate, but either tweaking Apache::Reload to
> know about files so
>     use Apache::Reload "/path/to/perlrun/script" ;
> would DWIM (analogous to require "/path/to/perllib" vs. require Module).
> Or a subclass of Apache::Reload, altering package_to_module()'s
> behavior.  Making Apache::Reload more flexible would be a good thing.

I guess just like Apache::DBI, we can make some Apache::Reload::Registry
and make it and Apache::Reaload to be aware of each other, and interact in
the way you suggest.

> > Hmm, that's an interesting idea. But won't it confuse users even more?
> > this will require them to move all the none-handler code up, or we need
> > something like: ##__REGISTRY_HANDLER_END__ as well.
> My experience is that most CGI scripts require munging to run nicely
> under Apache::Registry anyway, but YMMV.  Just looking to make the
> munging easier.

I doubt that your code example in your original post is a typical one.
Usually people scatter their handler code across many subs and I'm not
sure how ##__REGISTRY_HANDLER_END__ can really help. That doesn't mean
that we should try to help to make things easier, especially for those who
wrote the well structured code in first place. But for the latter it
should be very easy to do that anyway.

But what problem are we trying to solve here? The only problem I know of
is the closure effect, which happens when we wrap subs inside the
handler() sub. Or do you try to solve something else?

> > this custom parse() function can be always provided and used if wanted
> > instead of the default one, rather than complicating the parsing of the
> > default sub.
> Don't care about adding minor complication to that parser, myself, since
> the users never have to read it and, unless you do something non-minor,
> the performance hit won't be noticable except when stat()ing each time,
> and then the time taken to stat() and to recompile the resulting perl
> code will swamp the extra parse time, and the admin is specifically
> asking to be lower performance and developer-friendly anyway in that
> case.  But I agree the parser should only be complexified for good
> reason, like if ##__REGISTRY_HANDLER__ is deemed useful.

That shouldn't be a problem. If we design this system as flexible as
possible, we can try to provide a new handler which does what you suggest
and then if it works well and doesn't add a big overhead, than we can
replace the previous one with the latter one. Adding
##__REGISTRY_HANDLER__ shouldn't add any overhead at all, but that's only
in case where the script is preloaded.

> > >    - A source filter, to allow burgeoning young mod_perl hackers to
> > >      develop their radical new idea (ie a new templating system) easily.
> > >      This is probably best done as an overridable method like
> > >      parse_source().  The base class parse_source would (in my perfect,
> > >      little world) parse the source for the ##__BEGIN__HANDLER, __END__,
> > >      __DATA__, etc. tokens and pass the resulting chunks to the little
> > >      templating engine needed to do the following...
> >
> > Yup, just like your previous idea. write your own parse_source() and do
> > whatever you want.
> Well, not quite; the previous idea is not about what I'd want, it's
> about a specific feature that might make it easier to tweak CGI
> scripts to run under Apache::Registry.  This one's about making
> Apache::Registry pervertable ;-).  The former's an idea for a built in
> feature, the latter is for built in extensibility.

Yup, but the answer doesn't change. You are free to provide a new
parse_source and gen_code (as you explain later) and it should just work.
The only thing we should make sure is think well and put all the hooks
that we might need in the future (of course we can add these later as
well), then these hooks can be used with any custom code, therefore source
filters shouldn't be a problem at all.

The whole system is pretty much looking like Apache request cycle where we
register some hook to do something. The only difference is that we allow
only one hook per single phase. Or do you think you may want to run a few
different hooks and make it really like Apache hooks, where each hook
decides whether it wants to handle the current phase or not? I think that
would slow things down, since we will need to provide the whole mechanism
with calls.

> > :) this will be possible as well.
> >
> > > This would turn Apache::Registry in to a templating system backend
> > > and/or a custom application framework, whereby I could use Perl files
> > > ("scripts") instead of template files.
> >
> > Yup. Also check Apache::Template!
> Umm, that's not it at all; that's a prebuilt handler to call tt2.  What
> I'm suggesting is more like taking the current proposed model of:
>    cgiscript -> $code = parse_source() -> gen_code( $code ) -> eval
>    $code is the body of the handler: "use CGI ; ...."
> and making $code be a HASH so parse_source()+gen_code() can do some
> rearranging and so extra values (like source file name and mtime)
> can be passed in to gen_code(); and making gencode by default be a very
> simple application of a template to the passed-in HASH:
>    cgiscript -> $hash = parse_source() -> gen_code( $hash ) -> eval
>    $hash is { fn => "...", global_code => "...", code => "..." }
> .  gen_code() takes $hash and join()s it, just like the above gen_code()
> takes $code and join()s it.
> I'm taking it just a step further and proposing that gen_code()'s join()
> be built by compiling a simple template in to a join().  I've done this
> before as an experiment to see how small and fast a templating system
> could go without dipping in to C code; it produces small, fast code
> compared to techniques found on CPAN.  That's not important in the
> average templating situation, so I haven't foisted it on the world, but
> it does demonstrate that a sub like gen_code can be a converted template
> and be *exactly* as fast as hand-written join().

I guess I'm not very clear on what you are trying to accomplish. the
template you are talking about will look something like that?


code here

the code that will be wrapped into sub handler {}

I'm probably on the wrong track and if I do please give me a short example
of the template you are talking about. But if I'm on the right track, why
not to write a plain handler in first place?

> So, there would be no speed hit; there would be no memory hit unless a
> non-standard template is used (the template converter could be lazily
> loaded, and the default gen_code template can be pre-compiled to Perl
> code before distribution; and Apache::Registry gets more flexible.
> I'm not trying to replace or use something like Apache::Template; if it
> sounds like I am than I've not been clear enough.  I think using the
> hash passing and an overloadable gen_code() might be the optimal
> suggestion, the templating approach would be really nice if lots of
> people wanted to customize gen_code(), but I don't think a lot of people
> do; just letting it be overridden is enough...
> > I'd like to hear more ideas about solving the closure problem (in addition
> > to the one you've suggested with ##__APACHE__HANLER__.
> So would Larry and perl5-porters, I'm sure...


It's different for p5p, because you can avoid writing code that creates
undesirable effects. I don't think this is a problem for p5p, since
the closure effect is a feature. This is not a situation with
Apache::Registry, where this happens as an undesired side-effect in most

> 'nuther issue: what's the plan to make it threaded-MPM compatible given
> that it does a chdir() and calls

Hmm, I really have to start digging into the threading issues, I've
ignored them so far. Is chdir() not thread-safe? If you chdir in one
thread, does it affect other threads? Also what other calls you are
talking about?

Stas Bekman              JAm_pH     --   Just Another mod_perl Hacker       mod_perl Guide

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message