Return-Path: Delivered-To: apmail-httpd-apreq-dev-archive@httpd.apache.org Received: (qmail 50163 invoked by uid 500); 27 Aug 2002 17:11:43 -0000 Mailing-List: contact apreq-dev-help@httpd.apache.org; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Delivered-To: mailing list apreq-dev@httpd.apache.org Delivered-To: moderator for apreq-dev@httpd.apache.org Received: (qmail 6404 invoked from network); 26 Aug 2002 14:32:53 -0000 Errors-To: Message-Id: <5.1.0.14.2.20020826061017.02867448@pop3.rowe-clan.net> X-Sender: wrowe%rowe-clan.net@pop3.rowe-clan.net X-Mailer: QUALCOMM Windows Eudora Version 5.1 Date: Mon, 26 Aug 2002 09:30:57 -0500 To: "Issac Goldstand" From: "William A. Rowe, Jr." Subject: Re: dev question: apreq 2 as a filter? Cc: "William A. Rowe, Jr." , "apreq list" , "Joe Schaefer" , "Stas Bekman" In-Reply-To: <040101c24c0d$168bc620$1a0aa8c0@deepthought> References: <"Issac Goldstand"'s message of "Sun, 25 Aug 2002 01:12:31 +0300"> <200208212349.40583.chatgris@mediapow.com> <3D647453.7030704@stason.org> <3D651310.3050008@stason.org> <5.1.0.14.2.20020822123542.02919dc8@pop3.rowe-clan.net> <3D65AF9D.7060908@stason.org> <3D65CA9A.4050706@stason.org> <5.1.0.14.2.20020823035453.02bf93b8@pop3.rowe-clan.net> <3D6602E1.9070600@stason.org> <036301c24bbb$59f45eb0$1a0aa8c0@deepthought> <5.1.0.14.2.20020825005209.02afdea0@pop3.rowe-clan.net> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N At 02:57 AM 8/25/2002, Issac Goldstand wrote: >Let me restate this whole process, as I see it happening, using the >warehouse lingo that we were using before. I believe that we're on the same >wavelength here, but want to make sure... I see three major components >here: The filter, the parser, and the "warehouse manager"... I'll split out your points then and answer specific issues. >(1) First of all, the only real way that apreq should be installed is as an >input filter. >(2) The filter should be installed as early as possible and >(3) immediately create an empty data structure in memory - I'm not going to >say where (notes table should be fine if it's still there in Apache2), >because that's probably an entire conversation on its own. In any case, >user intervention SHOULD take place as early as possible in Apache's >request-phase chain as possible. >((4)Frankly, we may find useful to provide >httpd.conf directives to enable users to somewhat tweak the necessary >configurations, and provide a handler that runs as early as possible to scan >directives for each location before it starts. The action taken would This >should include a directive to *uninstall* (or disable, or whatever) the >apreq filter, too). The more I consider what apreq must accomplish, the more I'm against user 'config' of the apreq filter. It should be programmatically configured, by all of the modules that want it injected. That means one filter module could supercede another module's requirements. That is a bad thing. So we need to use a greatest common denominator configuration scheme. So module A. expects some POST variables that it expects are no greater than 8kb. module B. expects to deal with 64kb or greater, and is willing to handle a multipart-form upload. In this case, module B registered a file upload callback and requests that the set-aside or 'prescan' limit is 64kb. Those should override module A's miserly 8kb expectation. Even if module A calls apreq to inject itself after module B, the GCD needs to win. >Moving along, (5) mechanisms to override the default input method, which is >input and share with other filters, should be provided, and can be invoked >anywhere up until the first call to ap_get_brigade. Frankly, it ought to >work afterwards also, but then we run the risk of other filters choking on >mangled data. We wouldn't want another filter to do that to us, so we >oughtn't do it to them. NO! You cannot supercede the defined behavior! Otherwise the oddball module will -break- all other installed filters for the request!!! Think of superclassing a C object. In this HTTP schema, we have variables and values. We need a clean definition of how to read/retrieve them. If that defintion is reasonably extensible, the modules don't have to know that the received data was sent in multipart-form or in XML format. Extensible and large values will have to be supported in an abstract way. This is what makes brigades and metadata so [potentially] appealing. Those already have some definitions we can extend, that I think would cover nearly any future potential use cases. >In any case, (6) the configuration for the >request-specific parameters of the apreq call should be read during the >first callback of the actual filter (eg, first time ap_get_brigade is called >from anywhere). I'm suggesting the other filters and handler that want the data spell out their requirements. If those can be flexible [one says pre-cache 8kb, and another asks to precache 64kb, let's let the 64kb requestor win.] >(7)At that point, a flag is set in our little >request-specific apreq notepad to tell us that we've started munching data, >and that (7a) requests for behavior changes for the request-specific apreq >call should fail (not silently - it should return failure status to user - >possibly with a reason) and (7b) the warehouse doors are now open, but the >warehouse is flagged as being "stocking in progress" (the warehouse should >most likely NOT be in the same place as the configuration directives - the >former potentially needs lots fo room, while latter doesn't). 7a), why? If we could keep pre-fetching in order to satisfy a given request, let's do so. Remember that several filters and a handler may all be looking for apreq fields. It's best to play nicely with all of them. As long as our input filter keeps setting the data aside for subsequent ap_get_brigade() calls, and can later satisfy them, we should be fine. >(8)If the >"exclusive mode" flag is set for this request (file upload? It doesn't >matter - what matters is that this is the Apache 1.x style apreq that >everyone's so keen on having in apreq2), then we simply don't pass the >brigade on to the next filter, unless, of course, it's EOS. Bzzt.... that one's wrong. There is no exclusive mode in this model. All consumers must have access to all the input data. When a collection of PHP, Perl and some special purpose filters all want to see the variables, they will all see them. The only problem, huge posts [e.g. file uploads] are messy. I have a thought on that one, too. I'll propose a suggestion for the worst case [file upload variable] in another post to show that this isn't a problem. >(9)Also at this >point (this is still *first* ap_get_brigade call only), we check to see if >the "populate-at-once" flag is set for this request. We can have a >mechanism where we continuously call ap_get_brigade until we hit EOS to do >this. Note that the "populate-at-once" and "exclusive" modes can thus run >independantly of one-another. If we give the apreq consumer a simple call to ask for a given variable, and it's not yet present, we can continue to consume the client body and set it aside for the filter chain, until that 'variable' has been read complete or the entire client request body is read.. >(10) Lastly, once EOS is recieved, we mark the >warehouse as "warehouse full" in the request-specific configuration notepad. > What remains is the warehouse manager. Yup. We definately need the NOT_READ, IN_PROGRESS, COMPLETE and NO_BODY placeholder :-) >(11)I think we need a 3-key system >to manage the warehouse entries: "Data/Name", "Value" and some flag (bit?) >"Status". To do this, the parser would start populating entries in the >warehouse as it comes in (from the filter). Sounds right. I was picturing each residing in a metadata + data buckets, which I will write up a description for. >(12)As soon as each entry is >completed in the warehouse, the status flag should be set to indicate >"in-stock". (13)An entry in the per-request configuration "notepad" can >contain the name of the current "item" being imported into the warehouse. >(14)Calls to get data from warehouse (this is the "warehouse manager" part) >should scan the warehouse entries. (14a)If an item is "in-stock", no >additional data-collection is needed. (14b) If an item is in, but not >flagged, we call ap_get_brigade until it's flagged "in-stock" by the parser >(ONLY the parser can import to the warehouse, whereas ONLY the warehouse >manager can actually read items from the warehouse). (14c) If the data is >not found and the "warehouse full" flag is set, the call fails. (14d) >Otherwise, we continue to call ap_get_brigade (either explicitly from the >parser, or implicitly by simply setting the "populate-at-once" flag and >calling ap_get_brigade once from the parser [I'd say explicit is better, >simply becuase it allows us to contiually check the warehouse for the >addition of our data and stop calling ap_get_brigade once our data is >"in-stock"].) Once we hit "warehouse full" (note that the warehouse manager >doesn't care about EOS - all it cares about is "warehouse full") and haven't >found our data, the call fails. This all sounds about right. The biggest problem is consuming the occasional huge item that exceeds a sanity threshold, e.g. a file upload item, and that case I'll spell out in that another post when I have a few minutes. >I think that about covers the lifespan of an apreq call. What do you people >think? I hate new metaphors if we require programmers to code to them :-) I don't mind them at all for illustration though, yours works pretty well.