httpd-bugs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject DO NOT REPLY [Bug 35768] Missing file logs at far too high of log level
Date Thu, 16 Apr 2009 09:07:42 GMT

Jay Freeman <> changed:

           What    |Removed                     |Added
             Status|CLOSED                      |REOPENED
                 CC|                            |
         Resolution|WONTFIX                     |
           Severity|trivial                     |normal

--- Comment #9 from Jay Freeman <>  2009-04-16 02:07:37 PST ---
Let me rephrase this problem: the fact that this error setting works this way
is an opening for a denial of service attack on the disk space of the server.
When you are dealing with a high traffic site it is very important that nothing
an end user can do can cause disk space to get used in proportion to activities
they perform.

On my server, I get tens of thousands of 404s every day, and I'm not even a
"large" website (I only have a couple million users). If a user wanted to mess
with me, they'd just start requesting 404s off my server until I run out of
disk space. I go to great lengths to make certain I know what every single
write that is occurring due to my traffic is, from what database entries are
stored to what lines are logged.

To that end, I've turned off almost all of the access logs (having switched to
much more flexible database logging being done by my application), and have
carefully stared at all of the message and error logs on this box. The only
thing that I can't account for is why Apache insists on storing these "File
does not exist" events as if they were "errors"

Yes, I understand that this is "generating an HTTP error". However, there is a
major difference in HTTP between a Client Error (4xx) and a Server Error (5xx).
Lumping them all into the same error category makes no sense, on neither a
technical nor an intuitive level. There is a reason why these two different
types of "error" have fundamentally different codes in HTTP, and it is weird
that the most popular (and most flexible, and all around otherwise awesomest)
webserver doesn't honor the same distinction.

In all seriousness: I can't do anything to fix or help a 404. If someone
decides to go to a random URL that is either not supported anymore or never
existed in the first place, I'm not going to wait anxiously at the logs and see
if there's  a problem. I also can't do anything to handle 403 errors (some user
typed the wrong password) or 400 errors (someone's using a broken web browser).

In essence, any error that causes a 4xx is completely useless to me, and isn't
an /error/: it's nothing more than an interesting statistical anomaly. The
server errors, though, are important to me (and are hopefully important to
everyone else ;P): these are things I need to fix. More to the point: these are
things I have some hope of fixing at all, and that for which it is soely my
responsibility to minimize the occurrence.

In fact, the example error listed in the documentation, "premature end of
script headers", is so fundamentally more important than "file does not exist"
that the point should already be clear: the first one is an error caused by me,
and the second one is a mistake caused by a user.

So, thinking about it from this perspective, we can ask whose fault it is that
a line appears in the error log, or better: whose fault it is that 100 bytes of
space just got eaten from the server's disk. There really needs to be a setting
where things caused by me are logged and things caused by a user aren't.

In fact, this is so fundamental that it almost seems like an orthogonal problem
to error levels. The example warning is even more interesting to me than "file
does not exist": "child process 1234 did not exit, sending another SIGHUP".
While not terribly important, I'm actually kind of curious why my server
instances are being blocked so badly that they aren't reloading cleanly.

To take the fault angle: this is an event that I caused, and that I might have
some interest in fixing. Maybe there needs to be a way to say "don't print
errors to the logs if the errors would cause normal HTTP responses" or "are
caused by the user, not by the administrator"?

Regardless, the "file does not exist" spam needs to stop. :( Doing Google
searches finds a bunch of cases of people being burned by this, from some guy
who ended up with a 90GB log file from Google reindexing content he had taken
offline to companies having to reassure users that these "errors" they are
seeing in their log files since upgrading their product are harmless.

I'm actually curious; to the people who insist that these entries end up in the
error log: do you actually pay attention to them, and if you do, what do you do
when you see them? What is your tactical response to seeing a 404? Does anyone
out there actually want this in their logs? :(

Configure bugmail:
------- You are receiving this mail because: -------
You are the assignee for the bug.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message