httpd-docs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David E. Weekly" <da...@thereinc.com>
Subject The Peril Of Using ETags In A Cluster
Date Fri, 25 Apr 2003 01:01:08 GMT
Apache folks,

I just wrote a quickie article on how ETags can bite you in a cluster when
you leave the default ETag construction set enabled (which includes the
*inode* of the file!). I hope this gets documented somewhere on Apache.org,
too! :)

-david

==================================


http://david.weekly.org/writings/etags.php3

Apache administrators: beware ETags if you have more than one web server!
(If you only have one webserver this article will not be useful to you.)

HTTP/1.1 added the header response "ETag" to allow a server to define its
own way of uniquely identifying a point-in-time version of a specific file.
The ETag is unstructured data; it's just a string. The client, when
rerequesting a document, submits an "If-None-Match" header - if this header
does not match the server's ETag for the file, the server must retransmit
the document, even if the HTTP/1.0 "If-Last-Modified" header exactly matches
the "Last-Modified" date of the file.

This wouldn't be so bad as-is if it weren't for the way that Apache
implements ETag support by default. The default setting is to incorporate
the file's last modification date, its current size, and its Unix inode. The
first two make sense; I can understand wanting to make sure that both the
last-modified time and the size match what's on the client. But
incorporating the inode leads to some very bad behavior on clusters, because
a given file, such as LOGO.JPG might have the same size and modification
time on all of the webservers of the cluster, but the inode numbers are
guaranteed to be different.
This means that if you have four web servers, three times out of four when a
client connects to a random web server, the client's stored ETag will not
match the server's and the server will needlessly be forced to retransmit
the file to the client. As the number of web servers grows, the situation
quickly approaches the point where effectively no caching is happening at
all.

This is all compounded by a bug that I found in Internet Explorer 5 and 6,
where if the downloaded file's Last-Modified header matches the
If-Last-Modified header it sent in the request, IE doesn't bother to update
its cached ETag. This means that even if you were to force IE to keep
connecting to the same server (with the same inode for the file, etc.), once
it's made up its mind about an ETag it won't change it until the
Last-Modified time changes!

To fix this insanity, stick the following line in your Apache httpd.conf:

FileETag MTime Size

This will tell Apache to construct ETags based on only the modification time
and the filesize; specifically, it prevents Apache from using the inode of
the file in the ETag. Then touch all of your files to update your
last-modified time. The next time a client goes to your page, they'll
re-download the files, since the last-modified time changed, but then they
will have the "simplified" ETag (without an inode) and they won't have to
download the file again until the file actually next changes. Your pages
will be much snappier! :)




---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org


Mime
View raw message