httpd-apreq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Issac Goldstand" <mar...@beamartyr.net>
Subject Re: dev question: apreq 2 as a filter?
Date Mon, 26 Aug 2002 14:59:27 GMT

----- Original Message -----
From: "William A. Rowe, Jr." <wrowe@rowe-clan.net>
To: "Issac Goldstand" <margol@beamartyr.net>
Cc: "William A. Rowe, Jr." <wrowe@rowe-clan.net>; "apreq list"
<apreq-dev@httpd.apache.org>; "Joe Schaefer" <joe@sunstarsys.com>; "Stas
Bekman" <stas@stason.org>
Sent: Monday, August 26, 2002 5:30 PM
Subject: Re: dev question: apreq 2 as a filter?


> At 02:57 AM 8/25/2002, Issac Goldstand wrote:
>
> >Let me restate this whole process, as I see it happening, using the
> >warehouse lingo that we were using before.  I believe that we're on the
same
> >wavelength here, but want to make sure...  I see three major components
> >here:  The filter, the parser, and the "warehouse manager"...
>
> I'll split out your points then and answer specific issues.
>
> >(1) First of all, the only real way that apreq should be installed is as
an
> >input filter.
> >(2) The filter should be installed as early as possible and
> >(3) immediately create an empty data structure in memory - I'm not going
to
> >say where (notes table should be fine if it's still there in Apache2),
> >because that's probably an entire conversation on its own.  In any case,
> >user intervention SHOULD take place as early as possible in Apache's
> >request-phase chain as possible.
> >((4)Frankly, we may find useful to provide
> >httpd.conf directives to enable users to somewhat tweak the necessary
> >configurations, and provide a handler that runs as early as possible to
scan
> >directives for each location before it starts.  The action taken would
This
> >should include a directive to *uninstall* (or disable, or whatever) the
> >apreq filter, too).
>
> The more I consider what apreq must accomplish, the more I'm against
> user 'config' of the apreq filter.  It should be programmatically
configured,
> by all of the modules that want it injected.
>
> That means one filter module could supercede another module's
> requirements.  That is a bad thing.  So we need to use a greatest
> common denominator configuration scheme.
>
> So module A. expects some POST variables that it expects are no
> greater than 8kb.  module B. expects to deal with 64kb or greater,
> and is willing to handle a multipart-form upload.  In this case, module
> B registered a file upload callback and requests that the set-aside
> or 'prescan' limit is 64kb.  Those should override module A's miserly
> 8kb expectation.
>
> Even if module A calls apreq to inject itself after module B, the GCD
> needs to win.
>

Yes, assuming we're talking about the same request.  My point was that each
request should re-configure apreq to prepare the data in the best possible
manner based on the parameters of that particular request.  My visualization
of your example would be like this:

A) apreq injected into the filter chain.  Configuration table is created in
notes and set to default values.
B) Module A initializes itself.  Calls on apreq to get instance of request
object and attempts to set 'prebuffer' flag to 8kb.  Default value is, say,
4kb.  8 > 4, therefore apreq changes the value and returns OK.
C) Module B initializes itself.  Calls on apreq to get instance of request
object and attempts to set
'upload OK' to true.  Since it defaults to false, apreq changes the value.
Next it attempts to change prebuffer to 64kb.  Since this is greater than 8
it also changes and returns OK.

Now, the reverse situation is (A) then (C) and then:

D) Module A initializes itself.  Calls on apreq to get instance of request
object and attempts to set 'prebuffer' flag to 8kb.  Value is already 64.  8
< 64, so it does NOT change the value.  Returns OK.

> >Moving along, (5) mechanisms to override the default input method, which
is
> >input and share with other filters, should be provided, and can be
invoked
> >anywhere up until the first call to ap_get_brigade.  Frankly, it ought to
> >work afterwards also, but then we run the risk of other filters choking
on
> >mangled data.  We wouldn't want another filter to do that to us, so we
> >oughtn't do it to them.
>
> NO!  You cannot supercede the defined behavior!  Otherwise the oddball
> module will -break- all other installed filters for the request!!!
>
> Think of superclassing a C object.  In this HTTP schema, we have variables
> and values.  We need a clean definition of how to read/retrieve them.  If
that
> defintion is reasonably extensible, the modules don't have to know that
the
> received data was sent in multipart-form or in XML format.
>
> Extensible and large values will have to be supported in an abstract way.
> This is what makes brigades and metadata so [potentially] appealing.
> Those already have some definitions we can extend, that I think would
> cover nearly any future potential use cases.

Er, I think I'm missing the gist of your argument.  Maybe just because I'm
tired just now.  I'll try to read this again tomorrow and send you mail
off-list if I need you to clarify...

> >In any case, (6) the configuration for the
> >request-specific parameters of the apreq call should be read during the
> >first callback of the actual filter (eg, first time ap_get_brigade is
called
> >from anywhere).
>
> I'm suggesting the other filters and handler that want the data spell out
> their requirements.  If those can be flexible [one says pre-cache 8kb,
> and another asks to precache 64kb, let's let the 64kb requestor win.]

Agreed - see above.

> >(7)At that point, a flag is set in our little
> >request-specific apreq notepad to tell us that we've started munching
data,
> >and that (7a) requests for behavior changes for the request-specific
apreq
> >call should fail (not silently - it should return failure status to
user -
> >possibly with a reason) and (7b) the warehouse doors are now open, but
the
> >warehouse is flagged as being "stocking in progress" (the warehouse
should
> >most likely NOT be in the same place as the configuration directives -
the
> >former potentially needs lots fo room, while latter doesn't).
>
> 7a), why?  If we could keep pre-fetching in order to satisfy a given
request,
> let's do so.  Remember that several filters and a handler may all be
looking
> for apreq fields.  It's best to play nicely with all of them.
>
> As long as our input filter keeps setting the data aside for subsequent
> ap_get_brigade() calls, and can later satisfy them, we should be fine.

Well, I'm not going to say "you're wrong".  What I'm trying to do here,
though, is to avoid any module A getting any sort of nasty surprise as to
how apreq is handling its data just because module B decided to screw around
with apreq once it started slurping in the data...  Let's just say that 7a
is because I'm being cautious...

> >(8)If the
> >"exclusive mode" flag is set for this request (file upload? It doesn't
> >matter - what matters is that this is the Apache 1.x style apreq that
> >everyone's so keen on having in apreq2), then we simply don't pass the
> >brigade on to the next filter, unless, of course, it's EOS.
>
> Bzzt.... that one's wrong.  There is no exclusive mode in this model.  All
> consumers must have access to all the input data.  When a collection of
> PHP, Perl and some special purpose filters all want to see the variables,
> they will all see them.  The only problem, huge posts [e.g. file uploads]
> are messy.  I have a thought on that one, too.
>
> I'll propose a suggestion for the worst case [file upload variable] in
another
> post to show that this isn't a problem.

Well, to be honest, I don't see a need for it either - but it seemed that
everyone was trying to get an Apache 1.x-like model in apreq2.  I was trying
to meet that want.  If noone wants it, then chuck it...

> >(9)Also at this
> >point (this is still *first* ap_get_brigade call only), we check to see
if
> >the "populate-at-once" flag is set for this request.   We can have a
> >mechanism where we continuously call ap_get_brigade until we hit EOS to
do
> >this.  Note that the "populate-at-once" and "exclusive" modes can thus
run
> >independantly of one-another.
>
> If we give the apreq consumer a simple call to ask for a given variable,
and
> it's not yet present, we can continue to consume the client body and set
it
> aside for the filter chain, until that 'variable' has been read complete
or
> the
> entire client request body is read..

I said exactly that below.  It could be, though, that some module calls
$q->parse, which tells apreq to finish reading the entire request.  That
would be an example of populate-at-once mode.  I'm sure I could think of
others too...

> >(10) Lastly, once EOS is recieved, we mark the
> >warehouse as "warehouse full" in the request-specific configuration
notepad.
> >  What remains is the warehouse manager.
>
> Yup.  We definately need the NOT_READ, IN_PROGRESS, COMPLETE
> and NO_BODY placeholder :-)

Forgive me on my lack of knowledge of Apache 2.0 internals, but...  What's
that?

> >(11)I think we need a 3-key system
> >to manage the warehouse entries: "Data/Name", "Value" and some flag
(bit?)
> >"Status".  To do this, the parser would start populating entries in the
> >warehouse as it comes in (from the filter).
>
> Sounds right.  I was picturing each residing in a metadata + data buckets,
> which I will write up a description for.
>
> >(12)As soon as each entry is
> >completed in the warehouse, the status flag should be set to indicate
> >"in-stock".  (13)An entry in the per-request configuration "notepad" can
> >contain the name of the current "item" being imported into the warehouse.
> >(14)Calls to get data from warehouse (this is the "warehouse manager"
part)
> >should scan the warehouse entries.  (14a)If an item is "in-stock", no
> >additional data-collection is needed.  (14b) If an item is in, but not
> >flagged, we call ap_get_brigade until it's flagged "in-stock" by the
parser
> >(ONLY the parser can import to the warehouse, whereas ONLY the warehouse
> >manager can actually read items from the warehouse).  (14c) If the data
is
> >not found and the "warehouse full" flag is set, the call fails.  (14d)
> >Otherwise, we continue to call ap_get_brigade (either explicitly from the
> >parser, or implicitly by simply setting the "populate-at-once" flag and
> >calling ap_get_brigade once from the parser [I'd say explicit is better,
> >simply becuase it allows us to contiually check the warehouse for the
> >addition of our data and stop calling ap_get_brigade once our data is
> >"in-stock"].)  Once we hit "warehouse full" (note that the warehouse
manager
> >doesn't care about EOS - all it cares about is "warehouse full") and
haven't
> >found our data, the call fails.
>
> This all sounds about right.  The biggest problem is consuming the
occasional
> huge item that exceeds a sanity threshold, e.g. a file upload item, and
that
> case I'll spell out in that another post when I have a few minutes.
>
> >I think that about covers the lifespan of an apreq call.  What do you
people
> >think?
>
> I hate new metaphors if we require programmers to code to them :-)
> I don't mind them at all for illustration though, yours works pretty well.

Never said we have to officially *document* the warehouse metaphore, but if
we start using this for planning here, it gives us all a clear shared view
on which component of apreq we're dealing with. :-)

  Issac


Mime
View raw message