httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From TOKI...@aol.com
Subject Re: ContentDigest and filtering
Date Sat, 23 Sep 2000 00:21:29 GMT

In a message dated 00-09-22 15:06:09 EDT, Jeff Trawick writes...

* SHORT ANSWER(s)...

>> TOKILEY@aol.com writes...
>> The MD5 thing won't be the only header value which is
>> going to force ( finally ) good separation in the HTTP
>> protocol handling of what belongs to the CONTENT ( Presentation )
>> layer and what belongs to the Transfer-encoding ( transport )
>> layer.

> which I take to mean that you think it should only be implemented as a
> filter and that there should be no support of it in
> default_handler()...  correct interpretation?

Nope.

I really don't know why things have to be so black and white here.
If mod_auth_xxxx() and apr_MD5 work then don't break them... just make 
sure they can be used as EITHER a filter OR as quick-pass APIs by whoever 
is controlling that never-never land you are creating between the 
presentation 
and the transport layer. ( More about this below if you care ).

>> TOKILEY@aol.com writes:
>> It's really simple. Who OWNS the MD5 checksum? 
>> Is it a CONTENT thing or a TRANSPORT thing?
>
>  It's a content thing.

In the context of this discussion and Apache's use of MD5, it's a content 
thing.
It's not always just that. ( More about this below if you care ).

* END OF SHORT ANSWER(s). STOP HERE IF NOT INTERESTED IN THREAD

* LONGER ANSWER(s)...

Some wireless gateways are using an MD5 + ECC ( Elipitical Curve Compresssion 
)
checksum as a Transfer-encoding item but no need to go there
right now. The Apache MD5 is just the ACBT ( After Content, Before
Transport ) RFC thing.
  
> From RFC 2616 (but I'm guessing that you know this already):
>  
>    The MD5 digest is computed based on the content of the entity-body, 
>    including any content-coding that has been applied, but not
>    including any transfer-encoding applied to the message-body. If the
>    message is received with a transfer-encoding, that encoding MUST be
>    removed prior to checking the Content-MD5 value against the received
>    entity. 

Sounds innocent enough but think about what this implies with 
relation to the current filtering design approach.

There will ALWAYS be times when no matter how many content
filters have been applied to an outbound data stream there
will be 'things' that need to access the response as a 
whole before one single byte has ever gone to the network.

I really don't think this has sunk in yet.

When a data transfer protocol (like HTTP) is willy-nilly mixing presentation 
and
transport layer techniques and you are trying to filter outbound data for 
same-said 
protocol that has no in-stream EOS or standardized informational trailer then 
it's
going to get messy. Not impossible... just messy.

The fact that apr_MD5 routines themselves depend on knowing
the Content-length of the entire outbound response and the 
current filtering scheme can easily lose track of that and 
assume that defaulting to 'chunked' Transfer-encoding is going
to carte-blanche solve the content-length tracking problem is 
proof enough that some things are just going to break at first.

I've tried to talk about the importance of preserving
Content-length in any filtering approach as the data is
being morphed but nobody seems to get it ( Until now? ).

Reminds me of a line from a Start Wars movie...
"Only now... at the end... do you truly see."

What you are stuck with now (since HTTP still requires the
Server to worry about both content and transport layer phases) 
is the only way to make anything that is absolutely dependent 
on knowing the final content length is to have a DATA_STOPS_HERE
filter standing guard at the point of transmit which has to 
collect every single byte headed for the network and count it 
( and store it ) just so routines like apr_MD5 or DES and/or 
ZLIB gzip have any chance of working.

Somehow it ( DATA_STOPS_HERE filter ) has to KNOW early on 
that it is needed and that some degree of filtering is going to take place
and ( after it's been instantiated for this particular transaction ) if ANY 
filter thinks the data it is generating is going out to the Network Card 
right away then it's sadly mistaken. 

The 'data collection' and 'set aside' issues inherent with any filtering
scheme are simply still being ignored in lieu of 'just getting something
that works'.

The generic DATA_STOPS_HERE filter is going to need to appear
sooner or later just so things like MD5 can work the way they 
did before. Does that mean MD5 needs to be a filter itself? 

No, it does not.

* SHOULD MD5 BE A FILTER?

Say it's being used with a form. The Server has sent the NONCE 
challenge along with the HTML form itself. It has the one-time
MD5 vaporkey in it.

If the form passed through ANY kind of 'filtering' at Apache then
obviously the vaporkey had to be computed on the COMPLETE
RESULT of all CONTENT filtering, or it ain't gonna work.

Should the 'final MD5 step' be a filter itself? 

I actually don't think it should be. You have already said
yourself that all filter chains will be visible to all filters
and to move an MD5 checksum into that arena might
bring up a lot of security concerns. The last thing in the
world you want is someone to write some bogus MD5 filter that
can just waltz into the chain and either replace the one
that is there or insert its own 'fake' 32 byte signature.

mod_auth wouldn't allow that... but an MD5 'filter' would.

No... 'Content-MD5:' response header(s) are just something that
need to 'happen' in the never-never land between HTTP
content generation and the start of the transport phase.
They really don't have to be supplied by a filter at all.

* FILTERING POINT OF VIEW SHOULD BE 'NO CARE ALGO'...

MD5 is nothing but a 'one way' checksum applied to a block of data.
The 128 stochastically independent bits the algo spits out have no 
calculable relation to the original input but they DO make for a 
good transport checksum even if you don't care about users.
Lots of people use it for all kind of 'encoding' and 'transport' things.

MD5 might not be around for much longer, anyway. It's too easy to break.
SHA will probably replace it. SHA is resistant to sucessful MD5 attacks
and against differential cryptanalisis while MD5 is not.

Ron Rivest, author of MD5, suggested this change...

Content-Digest: 2A1238912371239587; alg=SHA

But I think HTTP 1.2 is also leaning towards this piece
of hackery that keeps the 'Content-MD5:' so legacy user agents
don't break but allows MD5 to be replaced with another (better) 
algo....

Content-MD5: 2A1238912371239587; alg=SHA

Whatever. They'll work it out.

When enough MIPS are common then SHA will fall
like swimming records at the Olympics as well so
the key is to be ready for anything. Whatever they
come up with next you can be sure of one thing...

>From a 'filtering' standpoint it will still need to happen
in the same place it always has before.

Solve this once and it should stand through whatever
MD5/SHA/ECC phase the security world loops through.

Would a standard in-stream EOS for HTTP and/or a standardized
informational message trailer help out with any scheme that
relys on counting/checking/summing bytes as they 'head out
the door'?... you damn betcha. Is that going to appear in
HTTP anytime soon? No way. Gotta deal with it the way it is.

Yours...
Kevin Kiley
CTO, Remote Communications, Inc.
http://www.RemoteCommunications.com/
http://www.RemoteCommunications.com/rctpd/ - Free IETF encoding Server

Mime
View raw message