httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Wouters <tho...@xs4all.net>
Subject ETags & NFS caches (Apache 1.3)
Date Thu, 26 Oct 2000 20:43:02 GMT


Apologies if this isn't the right list. I don't consider this a bugreport
yet, it's not a feature request, it's not Apache 2.0 related (though I
haven't seen anything to suggest 2.0 won't show the same 'problem') but I do
want to bring it under your attention, and possibly elicit a response --
this might be a common issue, after all. Also apologies for the length. I
hope it's interesting to at least some of you ;P

Basically, what I'm seeing is an exact same ETags and Last-Modified header
for two different files, from the same Apache 1.3 server, within a few
seconds of each other. If you don't believe me, I have about 100 output
files, HTTP repsonse headers and their corresponding reply bodies, all of
which have the same Last-Modified and ETags headers, but half of which have
slightly different contents. In this case, a one character difference, but
based on the bugreports we got, it's not limited to single-character
differences.

I'm the guy who whined about there not being an 'extreme performance tuning'
session(*) at the ApacheCon earlier this week, and I really meant it :) This
bug happens to manifest itself only on our single most frequented website,
www.startpagina.nl, and it's companion, www.pagina.nl. The server was
recently moved from a single P3/fast++ server running BSDI 4.0.1 and Apache
1.3.9, to a cluster of first two, now three, and soon four, P3/fast++
running BSDI 4.1 with Apache 1.3.12. The cluster is loadbalanced using an
Alteon Layer-4 ethernet switch, which is working just fine (we use those
switches for most of our services, nowadays) and the actual website contents
is stored on a Network Appliance Filer (a dedicated NFS server.) Each server
does between 2k and 3k of requests per minute, depending on the time of day.
The limiting factor seems to shift with each change we make, but rest
assured that it isn't Apache per se ;)

What we're seeing now, however, is that only for the most frequently
requested files, the NFS cache seems to get confused. If you modify a file
on one of the servers, the change is instantly visible on that machine. On
all other servers, the mtime (and thus the Last-Modified and (apparently)
the ETag headers) is modified, but the file Apache serves is still the old
file, the old contents. The weird thing is that this is Apache specific,
open the file in any other process and it'll show the new contents. What's
more, if you do that, Apache starts showing the new contents, right away.
Also, it's only for the most frequently accessed files; we haven't been able
to reproduce it by just placing a file in the document root. We had to make
innocent, tiny changes to the actual index page and its frames to actually
detect the behaviour.

The big problem about this is not the NFS caching. In due time, for some
reason or another, Apache will start serving the 'new' contents. But because
the Last-Modified and ETag are based off of mtime, *those headers do not
change* when the contents does. As a result, either because they use the
If-* HTTP 1.1 extention or because of their own caching mechanisms,
proxies and browser-caches don't see the change until you force them to
reload the file, or until the *next* edit of the same file. This is
obviously not right :P

We had a perl process in a tight loop doing 'stat()', 'open()', and doing a
checksum of the contents, and it 'saw' the change almost immediately. We
can't figure out what Apache is doing differently, but it may be as simple
as asking for the file so many more times that it triggers a bug in the OS.
I've looked at this and peered at this and boggled over this since last
saturday, together with a colleague, and we've both more or less come to the
conclusion that it *has* to be an OS bug. (David Reid (*wink*) will know
that I know about BSDI bugs ;-) Mostly because Apache immediately starts
serving the 'right' contents the moment the file is accessed by another
process. But this still doesn't explain why Apache isn't seeing the changes
in the first place, whereas our attempts to do the same things Apache does
in a similar fashion, at a similar frequency, fail to reproduce it. Anyone
have any clue what we are missing here ?

We currently worked around it by adding to cron:

* * * * *	cat <docroot>/* > /dev/null

but this still leaves a tiny window for clients to get new ETags and old
data, and obviously isn't an optimal solution :) Asside from that, though,
there is an honest to god 'bug' here: according to the HTTP 1.1 RFC: "An
entity tag MUST be unique across all versions of all entities associated
with a particular resource." Apache relies on the mtime to warn that a new
ETag should be generated, and this apparently breaks in some situations. On
the other hand, I can think of no other way to handle this :) If you cannot
rely on mtime, this means you have to read in and digest the file every
time. This is rediculous, and also exactly the reason I'm not inclined to
file a bugreport. 

We'll be researching this weird bug some more, by the way... it might not be
BSDI, but rather the Netapp filer OS that contains the bug. We're also going
to vary the Apache version, and if nothing else helps, try a few other OSes.
That is a fairly radical approach, though :P If there is any interest, I'll
keep you posted.

Highest regards,
	Thomas.

(*) About that whine: it wasn't a whine, just a suggestion :) If papers are
still accepted for ApacheCon 2001-US, I'd be happy to submit a proposal. If
not, I'll wait for 2001-EU, which I hope will be held in Amsterdam or
thereabouts ;-)

-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

Mime
View raw message