httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Stein <gst...@lyra.org>
Subject Re: [PATCH] mod_ssl input filtering...
Date Mon, 08 Oct 2001 01:51:47 GMT
On Sat, Oct 06, 2001 at 12:06:22PM -0500, William A. Rowe, Jr. wrote:
> From: "Greg Stein" <gstein@lyra.org>
> Sent: Saturday, October 06, 2001 11:11 AM
>...
> > Exactly. So why would you suggest that code that must operate within the
> > *filter* chain be moved to reside there??
> 
> If you suggest creating a setaside_filter.

I'm not suggesting that. If a filter has data that it cannot use or cannot
return at the moment, then it can set that aside for the next invocation.
But I see that as an internal feature of a few filters.

> IF a module or filter MUST not contend
> with more than (X) bytes, then _it_ can place the setaside_filter in the chain.
> (ap_get_client_block could always do so.)  These filters will BE RARE AND FAR 
> BETWEEN.  Take an HTML parser.  It HAS to do it's own setaside regardless, because
> it's the ONLY filter that understands these < > tags, and has to have complete
> tags to operate on.

Well, I don't think the set aside will occur because it only receives "<im"
from f->next. Instead, it will ask f->next for more data, thus forming "<img
src='foo'>", which can be parsed.

set-aside only occurs when a filter retrieves or generates more data than
the caller has asked for. gzip can easily end up doing this -- if the caller
asks for 1k but gzip decodes 3k, then the filter will need to set aside 2k
for a later call. If your HTML filter is, say, insert the image contents
into the stream, then it might end up generating more content than the
caller asked for, and will need to set aside the rest.

> 99% of the time, a given filter that must set-aside doesn't have any
> precognition of where it's setaside boundry will lay.

I'm not sure what this means. Set aside occurs because a filter has more
data than the caller asked for.

The HTTP_IN filter defines the "end of request" boundary, and it enforces
that by simply not requesting any data from f->next that would go past that
boundary.

> We have two choices to deal with these;
> 
>   1. Push back the incomplete data
> 
>   2. Set aside our incomplete data

I can't address this; it seems to follow from your previous statement, which
I don't understand :-)

> Now the current argument is "Well, filter's shouldn't get anything more than they
> bargened for".  That's not a rule for output filters, it shouldn't become one for 
> input filters.

The two filter stacks are *dramatically* different. Entirely different rules
apply.

* the handler *pushes* data down and through the output stack, until it
  ultimately reaches the network.

* the handler *pulls* data up and through the input stack, ultimately from
  the network

ap_pass_brigade() and ap_get_brigade() have entirely different semantics,
and we need to apply entirely different rules about filter construction as a
result of those semantics.

> If an Input filter gets more than it can handle, it can;

Faulty premise. The input filter should never *ask* for more than it can
handle. Therefore, it will never receive too much.

>...
>   3. Never be passed too much data
>...
> I simply argue that all output filters are -required- to handle all data passed
> to them.  There is little good reason to change this rule for input filters.

There are reasons to have different rules for input vs output filters.
However, this *is* one that is common. An input filter must deal with all
the input that it receives from f->next. The simple fact is: that it knows
what it can handle and will never ask for more than it can handle.

>...
> > Handler authors simply fetch data. And our interface for that is
> > ap_get_client_block() just like it has always been for them.
> 
> Right.  Doesn't change.  ap_get_client_block(), given 24kb of data, must give back
> data to the caller one 8kb block at a time.  Another setaside, right there.  Let this
> one function go to that effort.

More than this one function needs to define what f->next will give it.
Specifically, the HTTP_IN filter needs to limit what it asks for from the
next filter.

If a handler asks ap_get_client_block() for 8k of data, then the function
asks the input stack for 8k of data. It isn't going to get any more than
that, so it won't have to do any setaside. (which is fortunate because it
doesn't have f->ctx like filters, so it would need to use request_config
crap to give itself some global context (and I think it does today, but that
can go away))

>...
> > Returning what you were told to return isn't "throttling."
> 
> These are brigades.  The caller _rarely_ knows how much it needs or what is ideal.
> They are already read to memory.

Actually, stuff isn't already read into memory. We read from the socket when
people need more data from the socket.

> Can we shortcut this argument with the following logic?
> 
>   Connection (Protocol) filters
>...
>   Content (Body) filters
>...

Input filters are input filters. There is no need to create different rules
for them.

>...
>   They may
>   setaside part of the content if they cannot parse it complete, to be sent on
>   a future request when they have sufficient content to process.

Hmm. This comment makes me believe you have a mistaken view on input
filters. The filter does not *send* data to a request [handler]. The handler
*asks* for the data.

>...
> > SSL: again, the caller is specifying the boundary, *not* SSL. The SSL filter
> >      may ask f->next for an amount based on what is needed to decode a
> >      packet (that is in the opposite direction from what you mention above).
> 
> No, it's bidirectional constraints.  CORE_IN knows what it can return right now,
> SSL knows how much it needs to continue after a partial packet.

Bidirectional? No. CORE_IN is told what to return. It doesn't inherently
know the "proper" value. It is too low level to have a grasp of what the
caller determines to be proper.

Simple exercise: who are *all* the callers of the CORE_IN filter? What are
*all* of their "proper" values for their input?

The obvious answer is: you cannot know [because it is a pluggable system].

Thus, CORE_IN must be *told* what to return. Thus, no bidirectional info.

>...
> > We ignore in both directions. The all-important piece of information is the
> > filter asking f->next for a specific amount of data. The filter is the only
> > person in the driver's seat; it is the only one who knows what the proper
> > amount of data is for its particular needs, which won't screw up any
> > potential protocol boundary.
> 
> Not all filters care.  The filter chain shouldn't be Constrained until we hit
> the Connection (Protocol) end of the chain.  That's the beginning and end of
> my entire arguement.  You are raving that "WE MUST DO THIS BECAUSE OF CONNECTIONS",
> so you are poluting the Content stack with bogosity from the Connection stack.
> The server's existing Content stack model had poluted the Connection stack.

I'm not raving, and I'm talking about pollution. I am stating that input
filters should be passed a bytes-to-read amount, and that they should never
return more than that.

What you are talking about is something I mentioned in a previous note:

-----------------------------------------------------------------
On Sat, 6 Oct 2001 09:11:20 -0700, Greg Stein wrote:
> The goal is simple: input filters should never return more than what the
> caller asks for. There should be two modes of returning data: return one
> line or return (at most) N bytes.
> 
> [ based on some discussion in this thread, there can/should also be a new
>   mode which is "return whatever you think is a Good Amount [which may also
>   be limited by protocol considerations]". but that work can come once we
>   get the other stuff fixed. ]
> 
> Another goal is to fix the signature of the input filters. That *readbytes
> thing is way wrong.
-----------------------------------------------------------------

That is a second step because it is also preconditioned on getting the input
filters fixed w.r.t Justin's fixes. Filters have not been respecting that
parameter, but they need to. And the darn signature should be fixed along
with a hard, formal definition of the values/combinations of the parameters.
When we have *that*, then we have a platform for "give me a Good Amount".

>...
> > Because the guy asking for 1000 bytes knows that is the request boundary and
> > does not want any more than that. The higher level filter is in control, so
> > screw the lower filter and what it may want to return.
> 
> ONE pair of filters has that dialog.  Most filters don't, and shouldn't care.

Every filter must be able to respect the caller's request for a specified
amount. The caller may have particular requirements over how much data can
be returned.

I think your statement can be reformulated as: "Most filters should pass
AP_MODE_GOOD_AMOUNT, providing f->next with as much flexibility as possible."

[ that symbol name is for exposition purposes only; should be changed ]

Now *that*, I can agree with. And it gives you an input filter stack that is
relatively free of constraints on sizes.

But again: *EVERY* filter must obey the max-to-return parameter if it has
been provided.

>...
> > The HTTP_IN filter was a connection filter because of that return-data
> > thing. It would get too much data from the filter below it (which it *knew*
> > was a socket, btw; it was coded to never call back for more data), so it had
> > to store that into f->ctx. HTTP request parsing is part of request handling,
> > not the connection. By having it part of the connection, it means that you
> > cannot Upgrade the connection to a different protocol. Today, the HTTP_IN
> > filter is a request filter and it will never ask for more than the end of
> > the request, thus leaving the "next request" down in the connection fliters.
> 
> Eliminating the entire argument over how HTTP_IN, SSL_IN and CORE_IN work,
> I don't see how any of this discussion has anything to do with Connection
> filters.  I will say it one last time.  The new model is irrelevant and adds
> additional (unnecessary) code to Content filters.

Filters are filters. There is a particular signature for them, and a
particular semantic for their parameters. All filters should obey those
semantics. That includes the possibility of limiting what can be returned. A
filter simply cannot know what requirements its caller may have, so it
cannot violate its request. Preferably, the callers will give f->next
flexibility.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Mime
View raw message