From "William A. Rowe, Jr." <>
Subject Pondering strings in Apache 3.x
Date Tue, 19 Jul 2005 19:55:26 GMT
Greg and a few others voiced interest in moving from null-term
strings to counted strings for a future version of Apache.  
This was too broad a scope change to make it into 2.0, of course,
and was dropped on the floor for the time being.

I'm wondering today; what metadata interests us in an ap_string_t
prefix header?  I have a hunch that a short, 65536, is enough
to map most data we want to handle in one chunk; brigades are
better for handling large sets of data.  Of course we could push
that to an int, or size_t, but there would be a small memory
penalty.  It might be overcome by cpu-specific optimized int
or size_t handling behavior, since the assembly code wouldn't
need to truncate short values.

Perhaps, both bytes allocated/used, in order to play optimized
games with string allocation.  Perhaps, a refcount?  (This
doesn't play well with pool allocations, obviously.)

But the byte count clearly isn't enough.  I'm thinking of;

  encoding;  is this data URI escaped or un-escaped?

  tainted;   is it raw?  or has it been untainted with 
             context-specific validity checks?

  charset;   is this native?  (e.g. EBCDIC).  utf-8?
             opaque or otherwise a specific set?

What else interests us within an 'ap_string_t' header, that
would help eliminate bugs within httpd?  A random trailing
short following the string, in a 'string debug' mode, to 
detect buffer overflows?  Something similar to detect 

Open to all ideas.


