httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "William A. Rowe, Jr." <wr...@covalent.net>
Subject Re: [PATCH] mod_ssl input filtering...
Date Sat, 06 Oct 2001 17:06:22 GMT
From: "Greg Stein" <gstein@lyra.org>
Sent: Saturday, October 06, 2001 11:11 AM


> But it is also important to recognize that if filter FOO asks for X bytes,
> then it may know that is just what is needed and *it* won't have to do any
> setaside. Some could setaside, some won't.

Bang.  That's the entire disagreement in a nutshell.  I say it's bogus on
Content filters.  I'll start a new thread to introduce my proposed solution.

> > > HUH?? ap_get_client_block() is for modules to fetch data from the input. It
> > > is effectively a block-oriented read function that covers ap_get_brigade().
> > > Filters do *NOT* use that function, so I'm not sure how it is relevant here.
> > 
> > Filter's don't - the ultimate consumer (handler) does.
> 
> Exactly. So why would you suggest that code that must operate within the
> *filter* chain be moved to reside there??

If you suggest creating a setaside_filter.  IF a module or filter MUST not contend 
with more than (X) bytes, then _it_ can place the setaside_filter in the chain.  
(ap_get_client_block could always do so.)  These filters will BE RARE AND FAR 
BETWEEN.  Take an HTML parser.  It HAS to do it's own setaside regardless, because
it's the ONLY filter that understands these < > tags, and has to have complete
tags to operate on.

99% of the time, a given filter that must set-aside doesn't have any
precognition of where it's setaside boundry will lay.  We have two choices to
deal with these;

  1. Push back the incomplete data

  2. Set aside our incomplete data

Now the current argument is "Well, filter's shouldn't get anything more than they
bargened for".  That's not a rule for output filters, it shouldn't become one for 
input filters.  If an Input filter gets more than it can handle, it can;

  1. Push back the too-much data

  2. Set aside the too-much data

  3. Never be passed too much data

What a coinincidence... Two of these Three operations match the filter boundry
problem :)  Now, what's duplicate coding, handling two mechansims, so we limit
(or throttle, or whatever you want to call the bogus behavior), and so we can
set aside our own incomplete tags (or push them back.)

Options 1 and 2 is an either-or.  What do we want here?  Everyone is saying that
no, push back isn't necessary [I disagree].  But whatever, I'm not fighting for
that mechanism.  So we have RULE 2; a filter with incomplete data must implement
set-aside.  Why not extend that rule, as we had, that a filter with too much data
must implement set-aside.

RULE 3 is bogus.  It's double work for 10% of filters, 10% of filters need to
add it, and 30% couldn't care less but will have to implement it.  That's BS.

I simply argue that all output filters are -required- to handle all data passed
to them.  There is little good reason to change this rule for input filters.

If that was the old design, and Justin's patch introduced RULE 3, then I veto
that part of his patch for the reasons above.  Now, I'm not saying back it out,
I'm just arguing we go and restore this behavior.

Suddenly, SSL and Proxy don't need all this 'Fixing', because they wern't broke.
They can be cleaned up, certainly, but they both worked, as did charset_lite_in
and all the dozens of input filters a handful of folks have played with.

> Handler authors simply fetch data. And our interface for that is
> ap_get_client_block() just like it has always been for them.

Right.  Doesn't change.  ap_get_client_block(), given 24kb of data, must give back
data to the caller one 8kb block at a time.  Another setaside, right there.  Let this
one function go to that effort.

> > Meaning: reading more than (X) bytes will consume body that isn't meant for
> > that filter, or it's caller.  That is, more than a line and you may have jumped
> > into a request body (or yet another request's header, if pipelined.)
> 
> Returning what you were told to return isn't "throttling."

These are brigades.  The caller _rarely_ knows how much it needs or what is ideal.
They are already read to memory.

Can we shortcut this argument with the following logic?

  Connection (Protocol) filters Constrain the response to max bytes requested.
  They may (if they have additional knowledge, e.g. an SSL crypt filter) 
  request more data from the downstream filter, but they must return only 
  max bytes or less.

  Content (Body) filters do not need to implement max bytes requested.  They may
  setaside part of the content if they cannot parse it complete, to be sent on
  a future request when they have sufficient content to process.

> So back to your point, ignoring the throttling term: how is it that having
> functions obey their parameters (what to read and how much) is going to be
> buggy? Are you just looking to be contrary?

I don't see a point to Constraining Content filters.  Ever.

Three connection filters need to constrain their response, as you detail;

> CORE_IN: the caller specifies how much to read because the *caller* knows
> where the protocol boundaries are, and it does not want to read
> past that. CORE_IN knows nothing about boundaries, and only barely
> what "should" be read off the network. Optimal behavior is probably
> to read all that it can without blocking.

Within some arbitrary memory footprint; yes.  

> SSL: again, the caller is specifying the boundary, *not* SSL. The SSL filter
>      may ask f->next for an amount based on what is needed to decode a
>      packet (that is in the opposite direction from what you mention above).

No, it's bidirectional constraints.  CORE_IN knows what it can return right now,
SSL knows how much it needs to continue after a partial packet.

> HTTP: this is the only filter we have defined which has hard limits on how
>       much the caller can read.

Of course.

> > I warned at the filters meeting a year ago that sometimes the
> > consumer is ignorant of the producer,
> 
> *Never* "sometimes". The consumer is *always* ignorant of the producer, and
> vice versa. *NO* knowledge should be present about who is calling who.

We don't disagree.

> We had some of that in our input stack, and that was a broken model which we
> are now clearing up.
>
> > and we might even need some bidirectional
> > sharing of hints.
> 
> Okay... so you were wrong a year ago, too :-)

Now who's being contrary for the sake of it?

> > Justin's changes appear to say no, the consumer knows
> > everything it needs to.  I argue that if we will ignore one or the other,
> > we aught to ignore the consumer in favor of the producer (network, crypto, 
> > decompression or protocol.)
> 
> We ignore in both directions. The all-important piece of information is the
> filter asking f->next for a specific amount of data. The filter is the only
> person in the driver's seat; it is the only one who knows what the proper
> amount of data is for its particular needs, which won't screw up any
> potential protocol boundary.

Not all filters care.  The filter chain shouldn't be Constrained until we hit
the Connection (Protocol) end of the chain.  That's the beginning and end of
my entire arguement.  You are raving that "WE MUST DO THIS BECAUSE OF CONNECTIONS",
so you are poluting the Content stack with bogosity from the Connection stack.
The server's existing Content stack model had poluted the Connection stack.

> > > > I'd like some examples of other filters that can reasonably be expected
to
> > > > throttle the filtered bytes.  I really don't see the rational.  If anyone
> > > > needs a throttle, I'd expect we could do that once, correctly, right there
> > > > in ap_get_client_block, instead of within EVERY data-expanding filter.
> > > 
> > > There is that "throttle" again. I'd be happy to answer, but I have no idea
> > > what you're talking about :-)
> > 
> > Determining the amount of data that a filter will get from it's read request,
> > without respecting the caller's preference (because the lower level filter has 
> > more information about the transmission characteristics or protocol in place.)
> > 
> > Feel free to give me another word to use here instead of throttle to describe
> > this characteristic.
> 
> "Limit" works better. "Throttle" has connotations of limiting based on
> various runtime situations like bandwidth usage and memory consumption and
> stuff. Throttling is a protective limiting against over-consumption.

I like Contrained even better, which you introduced above.

> So your question is about which filters will limit the returned data
> regardless of what the caller wants. Only one that I know of is HTTP_IN,
> when it determines the request boundary.

That's the only one I can think of either.

> That limit *must* occur within the filter chain, not up in
> ap_get_client_block. If filters are above HTTP_IN, they are necessarily
> request-level filters, so they only apply to the current request. Therefore,
> they cannot be allowed to read portions of the next request.

Duh.  They couldn't before.  Now, you are going to inflict all this Constraint
logic on the filter chain before we hit HTTP_IN?  I don't think so.

> > Why shouldn't a filter be asked to deal with a cumulative 19kb from an 8kb
> > network packet after it's been munged by filters below that filter?
> 
> Because the guy asking for 1000 bytes knows that is the request boundary and
> does not want any more than that. The higher level filter is in control, so
> screw the lower filter and what it may want to return.

ONE pair of filters has that dialog.  Most filters don't, and shouldn't care.

> > We have
> > a cost of setting aside buckets from the filter behind us, or setting aside
> > buckets for the filter before us.  This all seems a little silly since the
> > old model (if a bit crusty) was working.
> 
> No. It was not working. Did you ever send a chunked request? Pull an old
> version and try that. Based on my reading of the code, you'll hit a
> recursion and crash once your stack runs out. That could have been patched
> in some way, though.

No doubt, an it could have been fixed.  Moving on...

> The HTTP_IN filter was a connection filter because of that return-data
> thing. It would get too much data from the filter below it (which it *knew*
> was a socket, btw; it was coded to never call back for more data), so it had
> to store that into f->ctx. HTTP request parsing is part of request handling,
> not the connection. By having it part of the connection, it means that you
> cannot Upgrade the connection to a different protocol. Today, the HTTP_IN
> filter is a request filter and it will never ask for more than the end of
> the request, thus leaving the "next request" down in the connection fliters.

Eliminating the entire argument over how HTTP_IN, SSL_IN and CORE_IN work,
I don't see how any of this discussion has anything to do with Connection
filters.  I will say it one last time.  The new model is irrelevant and adds
additional (unnecessary) code to Content filters.

> > We could have simply changed the
> > definition of a -1 request to "everything ready to process" instead of the
> > old "entire stream" idea, and the filters would have been easy to tweak.
> 
> That -1 thing was the least of the worries.

Not your worry, but it didn't need to be broken in the first place.

Bill


Mime
View raw message