httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henrik Nordstrom <...@squid-cache.org>
Subject Re: mod_disk_cache summarization
Date Fri, 27 Oct 2006 22:36:08 GMT
lör 2006-10-28 klockan 00:21 +0200 skrev Henrik Nordstrom:
> fre 2006-10-27 klockan 23:33 +0200 skrev Graham Leggett:
> 
> > A second approach could involve the use of the Etags associated with 
> > file responses, which in the case of files served off disk (as I 
> > understand it) are generated based on inode number and various other 
> > uniquely file specific information.
> 
> How ETag:s is generated is extremely server dependent, and not
> guaranteed to be unique across different URLs. You can not at all count
> on two files having the same ETag but different URLs to be the same
> file, unless you also is responsible for the server providing all the
> URLs in question and know that the server guarantees this behavior of
> ETag beyond what the HTTP specification says.

Content-MD5 may be possible to use for this purpose of identifying the
same file from different URLs, if it wasn't for the stupid facts that

a) Few if any servers send Content-MD5

b) The HTTP standard is a bit ambiguous on the meaning Content-MD5 and
can mean different things on 204 responses depending on who reads the
spec..

c) There is no conditional to ask for a file only if the Content-MD5
differs. Only way to get the Content-MD5 without the actual content if
it's the same is to use a HEAD request and manually compare the header.
And due to the ambiguity mentioned above I would not count on
Content-MD5 being correct in HEAD responses..

d) And even if the Content-MD5 is the same it says nothing about the
entity headers (content-type etc). Two responses with different entity
headers are different responses even if their body is the same.


If you do use Content-MD5 or a similar checksum you better verify the
checksum to match the content before migrating it to another URL. If not
you could open yourself up to cache pollution attacks.

Regards
Henrik

Mime
View raw message