Return-Path: Delivered-To: apmail-new-httpd-archive@apache.org Received: (qmail 23828 invoked by uid 500); 24 Oct 2000 19:33:58 -0000 Mailing-List: contact new-httpd-help@apache.org; run by ezmlm Precedence: bulk Reply-To: new-httpd@apache.org list-help: list-unsubscribe: list-post: Delivered-To: mailing list new-httpd@apache.org Received: (qmail 23785 invoked from network); 24 Oct 2000 19:33:56 -0000 Sender: rederpj@raleigh.ibm.com Message-ID: <39F5E505.4C7926CF@raleigh.ibm.com> Date: Tue, 24 Oct 2000 15:37:41 -0400 From: "Paul J. Reder" X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.12-20 i686) X-Accept-Language: en MIME-Version: 1.0 To: "new-httpd@apache.org" Subject: Mod_include design Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Spam-Rating: locus.apache.org 1.6.2 0/1000/N I have been working on rewriting mod_include. I haven't posted anything here because I thought I would have it done pretty soon, but I have seen several posts of patches to get it working so I thought I should post some design and status info. Basically, as I understand it, mod_include needs to be fixed to handle SSI tags that can span multiple buckets or, in worst case, multiple brigades. The rest of the SSI processing (once the tag is obtained) seems to be ok. In order to do this I am converting mod_include to be a state machine with the state stored in the attached data structure in the filter ctx. HIGH LEVEL DESIGN: Buckets are parsed until the STARTING_SEQUENCE (or at least the first byte(s)) are found. Once the STARTING_SEQUENCE is found, buckets are set aside into the ssi_brigade in ctx until the ENDING_SEQUENCE is found. The bucket containing the first byte of the STARTING_SEQUENCE is split at the first byte. All buckets up to the new split bucket are passed on. If a partial STARTING_SEQUENCE is found at the end of a bucket, that bucket is split and set aside. If the bytes in the next bucket do not continue to match the STARTING_SEQUENCE the state machine is reset. The set aside buckets will be passed on. Once the full SSI tag has been parsed and set aside, the tag is copied into a single buffer for processing and all of the set aside buckets are released. >From here on, processing of the SSI tag proceeds as before. CURRENT STATUS: I have rewritten the code that loops through buckets so that it now operates as a state machine where the state is stored into f->ctx for use across brigade invocations. mod_include is now able to operate not only across buckets within a brigade, but across brigades as well. I have written the code that handles the worst case scenario (one byte per bucket with one bucket per brigade). I am in the process of finishing the code to do the bucket splitting and set aside management. After that I need to write the code to walk the set aside buckets and copy the tag into a buffer and release the buckets. Then test. I will be writing a test filter to put in front of mod_include that will do pathological things to brigades (like force the worst case scenario) so that all of the boundary conditions can be tested in a controlled way. I'm sorry this has taken me so long, but I have been out of the 2.0 loop for a while and have had a lot to learn with all of the brigade and filter code. I have been picking up steam over the past couple of days and hope to have something working in the next couple of days. It would be very helpful if I were allowed to finish this without dribs and drabs of patches to get this piece or that piece fixed in the current version, it just means I need to manually merge the parts of these patches that still apply into the rewritten code to keep them in synch. /**************************************************************************** * Used to keep context information during parsing of a request for SSI tags. * This is especially useful if the tag stretches across multiple buckets or * brigades. This keeps track of which buckets need to be replaced with the * content generated by the SSI tag. * * state: PRE_HEAD - State prior to finding the first character of the * STARTING_SEQUENCE. Next state is PARSE_HEAD. * PARSE_HEAD - State entered once the first character of the * STARTING_SEQUENCE is found and exited when the * the full STARTING_SEQUENCE has been matched or * a match failure occurs. Next state is PRE_HEAD * or PARSE_TAG. * PARSE_TAG - State entered once the STARTING sequence has been * matched. It is exited when the first character in * ENDING_SEQUENCE is found. Next state is PARSE_TAIL. * PARSE_TAIL - State entered from PARSE_TAG state when the first * character in ENDING_SEQUENCE is encountered. This * state is exited when the ENDING_SEQUENCE has been * completely matched, or when a match failure occurs. * Next state is PARSE_TAG or PARSED. * PARSED - State entered from PARSE_TAIL once the complete * ENDING_SEQUENCE has been matched. The SSI tag is * processed and the SSI buckets are replaced with the * SSI content during this state. * parse_pos: Current matched position within the STARTING_SEQUENCE or * ENDING_SEQUENCE during the PARSE_HEAD and PARSE_TAIL states. * This is especially useful when the sequence spans brigades. * X_start_bucket: These point to the buckets containing the first character * of the STARTING_SEQUENCE, the first non-whitespace * character of the tag, and the first character in the * ENDING_SEQUENCE (head_, tag_, and tail_ respectively). * The buckets are kept intact until the PARSED state is * reached, at which time the tag is consolidated and the * buckets are released. The buckets that these point to * have all been set aside in the ssi_tag_brigade (along * with all of the intervening buckets). * X_start_index: The index points within the specified bucket contents * where the first character of the STARTING_SEQUENCE, * the first non-whitespace character of the tag, and the * first character in the ENDING_SEQUENCE can be found * (head_, tag_, and tail_ respectively). * combined_tag: Once the PARSED state is reached the tag is collected from * the bucket(s) in the ssi_tag_brigade into this contiguous * buffer. The buckets in the ssi_tag_brigade are released * and the tag is processed. * tag_length: The number of bytes in the actual tag (excluding the * STARTING_SEQUENCE, leading and trailing whitespace, * and ENDING_SEQUENCE). This length is computed as the * buckets are parsed and set aside during the PARSE_TAG state. * ssi_tag_brigade: The temporary brigade used by this filter to set aside * the buckets containing parts of the ssi tag and headers. */ typedef struct include_filter_ctx { enum state {PRE_HEAD, PARSE_HEAD, PARSE_TAG, PARSE_TAIL, PARSED}; apr_ssize_t parse_pos; ap_bucket *head_start_bucket; apr_ssize_t head_start_index; ap_bucket *tag_start_bucket; apr_ssize_t tag_start_index; ap_bucket *tail_start_bucket; apr_ssize_t tail_start_index; char *combined_tag; apr_ssize_t tag_length; ap_bucket_brigade ssi_tag_brigade; } include_ctx_t; -- Paul J. Reder ----------------------------------------------------------- "The strength of the Constitution lies entirely in the determination of each citizen to defend it. Only if every single citizen feels duty bound to do his share in this defense are the constitutional rights secure." -- Albert Einstein