httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Trawick <trawi...@bellsouth.net>
Subject Re: cvs commit: apache-2.0/src/main http_core.c
Date Wed, 25 Oct 2000 17:59:51 GMT
rbb@covalent.net writes:

> On 25 Oct 2000, Jeff Trawick wrote:
> > rbb@covalent.net writes:
> > > On 25 Oct 2000 trawick@locus.apache.org wrote:
> > > 
> > > This is bogus.  This is what the filtering is supposed to prevent.  Why
> > > isn't the charset-lite filter taking care of this?
> > 
> > The existing charset-lite filter is for bodies, not for protocol
> > data.  It doesn't even see most protocol data.
> 
> Why?  Just put it lower in the filter stack, and this problem goes away
> doesn't it.

No.  Protocol data and text bodies are (in general) in different
character sets.  Binary bodies aren't in character sets at all.  It
would be an entirely different type of filter which would know to
translate protocol data but leave everything else alone.  This is the
hypothetical implementation charset filter.  But a filter alone
doesn't solve the problem.  How would it know what was protocol data?
Certain parts of the HTTP handling would have to attach metadata here
and there to tell an implementation charset filter what is protocol
data.  Any code which manipulates the buckets would have to maintain
the metadata appropriately.  It turns out that it is much simpler to
put the data into the brigade in the right format than it is to
store/maintain the information for some other piece of code to do the
conversion.

> > At one point, the goal was to have an implementation charset filter
> > which on an EBCDIC platform would translate protocol data.  Tony, who
> > introduced the concept in the first place, decided later for reasons I
> > can't remember that it wouldn't be appropriate, and that we would
> > translate directly in the handful of places through which protocol
> > data passes.
> 
> Most of his arguments make sense for input filtering, but I completely
> disagree for output filtering.  There is no reason to hack the clean
> filters that we have to do charset translation.  This should be a filter.
> 
> If we use ifdef's, then we end up inserting an arbitrary limitation to
> what an Apache binary can do.  

What arbitrary limitation(s) are you referring to?

Note that the performance impact even to ASCII Apaches in order to
separate the translation of protocol data from the filter which puts-
it-into/takes-it-out-of the brigade makes it very likely that the
ifdefs will come back anyway.

At any rate, I see only 4 ifdefs scattered through the code to get
protocol data converted properly (one each in chunk_filter(),
getline(), ap_send_header_field(), ap_bputstrs()).  (A bit of this is
not yet committed.)

A couple of new macros (e.g., ap_xlate_proto_to_ascii() and
ap_xlate_proto_from_ascii()) would clean it up further.  These macros
would be noops on ASCII machines.

>                                If this is a filter, this is extensible by
> simply adding another filter.  Since the filter is basically simple to
> implement, I would really like to understand why we aren't doing it.

The ics filter is really simple, assuming that various other pieces of
code do the hard work of giving the ics filter the information it
needs to do the conversion.  Overall, the ics approach is more complex.

-- 
Jeff Trawick | trawick@ibm.net | PGP public key at web site:
     http://www.geocities.com/SiliconValley/Park/9289/
          Born in Roswell... married an alien...

Mime
View raw message