Mailing-List: contact new-httpd-help@apache.org; run by ezmlm
Precedence: bulk
Reply-To: new-httpd@apache.org
Sender: rederpj@raleigh.ibm.com
Message-ID: <39F5E505.4C7926CF@raleigh.ibm.com>
Date: Tue, 24 Oct 2000 15:37:41 -0400
From: "Paul J. Reder" <rederpj@raleigh.ibm.com>
MIME-Version: 1.0
To: "new-httpd@apache.org" <new-httpd@apache.org>
Subject: Mod_include design
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

I have been working on rewriting mod_include. I haven't posted anything here
because I thought I would have it done pretty soon, but I have seen several
posts of patches to get it working so I thought I should post some design
and status info.

Basically, as I understand it, mod_include needs to be fixed to handle SSI
tags that can span multiple buckets or, in worst case, multiple brigades.
The rest of the SSI processing (once the tag is obtained) seems to be ok.
In order to do this I am converting mod_include to be a state machine with
the state stored in the attached data structure in the filter ctx.

HIGH LEVEL DESIGN:

Buckets are parsed until the STARTING_SEQUENCE (or at least the first byte(s)) 
are found. Once the STARTING_SEQUENCE is found, buckets are set aside into
the ssi_brigade in ctx until the ENDING_SEQUENCE is found. The bucket containing
the first byte of the STARTING_SEQUENCE is split at the first byte. All buckets
up to the new split bucket are passed on.

If a partial STARTING_SEQUENCE is found at the end of a bucket, that bucket is split
and set aside. If the bytes in the next bucket do not continue to match the
STARTING_SEQUENCE the state machine is reset. The set aside buckets will be passed on.

Once the full SSI tag has been parsed and set aside, the tag is copied into a
single buffer for processing and all of the set aside buckets are released.

>From here on, processing of the SSI tag proceeds as before.

CURRENT STATUS:

I have rewritten the code that loops through buckets so that it now operates as
a state machine where the state is stored into f->ctx for use across brigade 
invocations. mod_include is now able to operate not only across buckets within
a brigade, but across brigades as well.

I have written the code that handles the worst case scenario (one byte per bucket
with one bucket per brigade).

I am in the process of finishing the code to do the bucket splitting and set aside
management.

After that I need to write the code to walk the set aside buckets and copy the tag
into a buffer and release the buckets. Then test.

I will be writing a test filter to put in front of mod_include that will 
do pathological things to brigades (like force the worst case scenario) so
that all of the boundary conditions can be tested in a controlled way.

I'm sorry this has taken me so long, but I have been out of the 2.0 loop for
a while and have had a lot to learn with all of the brigade and filter
code. I have been picking up steam over the past couple of days and hope to
have something working in the next couple of days. It would be very helpful if
I were allowed to finish this without dribs and drabs of patches to get this
piece or that piece fixed in the current version, it just means I need to manually
merge the parts of these patches that still apply into the rewritten code to keep
them in synch.


/****************************************************************************
 * Used to keep context information during parsing of a request for SSI tags.
 * This is especially useful if the tag stretches across multiple buckets or
 * brigades. This keeps track of which buckets need to be replaced with the
 * content generated by the SSI tag.
 *
 * state: PRE_HEAD - State prior to finding the first character of the 
 *                   STARTING_SEQUENCE. Next state is PARSE_HEAD.
 *        PARSE_HEAD - State entered once the first character of the
 *                     STARTING_SEQUENCE is found and exited when the
 *                     the full STARTING_SEQUENCE has been matched or
 *                     a match failure occurs. Next state is PRE_HEAD
 *                     or PARSE_TAG.
 *        PARSE_TAG - State entered once the STARTING sequence has been
 *                    matched. It is exited when the first character in
 *                    ENDING_SEQUENCE is found. Next state is PARSE_TAIL.
 *        PARSE_TAIL - State entered from PARSE_TAG state when the first
 *                     character in ENDING_SEQUENCE is encountered. This
 *                     state is exited when the ENDING_SEQUENCE has been
 *                     completely matched, or when a match failure occurs.
 *                     Next state is PARSE_TAG or PARSED.
 *        PARSED - State entered from PARSE_TAIL once the complete 
 *                 ENDING_SEQUENCE has been matched. The SSI tag is
 *                 processed and the SSI buckets are replaced with the
 *                 SSI content during this state.
 * parse_pos: Current matched position within the STARTING_SEQUENCE or
 *            ENDING_SEQUENCE during the PARSE_HEAD and PARSE_TAIL states.
 *            This is especially useful when the sequence spans brigades.
 * X_start_bucket: These point to the buckets containing the first character
 *                 of the STARTING_SEQUENCE, the first non-whitespace
 *                 character of the tag, and the first character in the
 *                 ENDING_SEQUENCE (head_, tag_, and tail_ respectively).
 *                 The buckets are kept intact until the PARSED state is
 *                 reached, at which time the tag is consolidated and the
 *                 buckets are released. The buckets that these point to
 *                 have all been set aside in the ssi_tag_brigade (along
 *                 with all of the intervening buckets).
 * X_start_index: The index points within the specified bucket contents
 *                where the first character of the STARTING_SEQUENCE,
 *                the first non-whitespace character of the tag, and the
 *                first character in the ENDING_SEQUENCE can be found
 *                (head_, tag_, and tail_ respectively).
 * combined_tag: Once the PARSED state is reached the tag is collected from
 *               the bucket(s) in the ssi_tag_brigade into this contiguous
 *               buffer. The buckets in the ssi_tag_brigade are released
 *               and the tag is processed.
 * tag_length: The number of bytes in the actual tag (excluding the
 *             STARTING_SEQUENCE, leading and trailing whitespace,
 *             and ENDING_SEQUENCE). This length is computed as the
 *             buckets are parsed and set aside during the PARSE_TAG state.
 * ssi_tag_brigade: The temporary brigade used by this filter to set aside
 *                  the buckets containing parts of the ssi tag and headers.
 */
typedef struct include_filter_ctx {
    enum         state {PRE_HEAD, PARSE_HEAD, PARSE_TAG, PARSE_TAIL, PARSED};
    apr_ssize_t  parse_pos;
    
    ap_bucket   *head_start_bucket;
    apr_ssize_t  head_start_index;

    ap_bucket   *tag_start_bucket;
    apr_ssize_t  tag_start_index;

    ap_bucket   *tail_start_bucket;
    apr_ssize_t  tail_start_index;

    char        *combined_tag;
    apr_ssize_t  tag_length;

    ap_bucket_brigade ssi_tag_brigade;
} include_ctx_t;

-- 
Paul J. Reder
-----------------------------------------------------------
"The strength of the Constitution lies entirely in the determination of each
citizen to defend it.  Only if every single citizen feels duty bound to do
his share in this defense are the constitutional rights secure."
-- Albert Einstein