httpd-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris 'Xenon' Hanson <xe...@3dnature.com>
Subject [users@httpd] Meaning and interpretation of 206 code sizes in Apache logs
Date Mon, 12 Feb 2007 18:40:34 GMT
   The VTP project (vterrain.org) recently had an interesting experience where 
bandwidth-monitoring (and billing) procedures reported an enormous bandwidth spike, 
resulting in a big hosting bill.

   The billing situation was resolved amicably, but in the spirit of understanding what 
really happened, we've been investigating the cause and effect. It appears that the 
traffic came from (legitimate, non-malicious) Far East users, probably on the other side 
of a problematic Internet connection, most likely caused by the damage to an undersea 
cable in December. The result of the network unreliability was that they had difficulty 
downloading large data intact from the VTP server, and employed a common 
download-accelerator that is able to keep retrying the server and resuming a file-transfer

partway through. All of this is very normal and common.

   From the logs, it would appear that the download accelerator requests the file with a 
Range header (with a fairly large range). Then, Apache begins sending the data. Shortly, 
the connection fails (after the client has only received a little of the data) and Apache

logs the request. The process repeats. Eventually, the client does receive the entire 
file, but not until after many many 206 (partial-content) entries are logged.


   According to the host (Hurricane Electric, HE.net):
http://www.he.net/faq/traffic_storage.html
"What Methods do you use to determine traffic usage?
We determine web traffic usage by extracting information from the access_log files 
generated by the HTTP daemon."

   Based on this, we come to some interesting contradictions:


 >From http://vterrain.org/.status/web.html

Time  Requests  Bytes Sent
----- -------- ------------
00:00      893     18955972
01:00    22384 141295771628
02:00     4330   3881123738
03:00      626      7740512

Yup, in one hour, a naive counting of Apache's "bytes transferred" says
there was 141 GB of traffic.  About a month's worth of normal traffic - in
an hour.

Apache's log has endless amounts of this:

211.162.235.226 - - [03/Feb/2007:01:08:12 -0800] "GET /data/island_8k.zip
HTTP/1.1" 206 10359197
211.162.235.226 - - [03/Feb/2007:01:08:12 -0800] "GET /data/island_8k.zip
HTTP/1.1" 206 42725451
211.162.235.226 - - [03/Feb/2007:01:08:12 -0800] "GET /data/island_8k.zip
HTTP/1.1" 206 15765857
211.162.235.226 - - [03/Feb/2007:01:08:12 -0800] "GET /data/island_8k.zip
HTTP/1.1" 206 37503695
211.162.235.226 - - [03/Feb/2007:01:08:12 -0800] "GET /data/island_8k.zip
HTTP/1.1" 206 21176362
211.162.235.226 - - [03/Feb/2007:01:08:12 -0800] "GET /data/island_8k.zip
HTTP/1.1" 206 5127115
211.162.235.226 - - [03/Feb/2007:01:08:12 -0800] "GET /data/island_8k.zip
HTTP/1.1" 206 10359197
211.162.235.226 - - [03/Feb/2007:01:08:12 -0800] "GET /data/island_8k.zip
HTTP/1.1" 206 42725451
211.162.235.226 - - [03/Feb/2007:01:08:12 -0800] "GET /data/island_8k.zip
HTTP/1.1" 206 37503695
211.162.235.226 - - [03/Feb/2007:01:08:13 -0800] "GET /data/island_8k.zip
HTTP/1.1" 206 15765857
211.162.235.226 - - [03/Feb/2007:01:08:13 -0800] "GET /data/island_8k.zip
HTTP/1.1" 206 21176362

If those were real bytes transferred, it would be something like 400
MB/second.  This is to distant users happy to get K/second.


   So, this raises a couple of interesting questions.


   1. Is HE.net making a mistake in using access_log to count bandwidth?
    1a. What would be the right way to do it?
    1b. Are they doing it this way because they're using some existing tool that does it 
this way?
    1c. Are other hosts doing it this way, and therefore are mistakenly over-measuring and

over-charging customers?

   2. What exactly does the number after the 206 code in the access_log mean? Is it simply

the range the client _requested_ via the Range header? In which case, it has no real 
relationship to how much data was _actually_ transferred?


   Appreciate any insight from anyone. As I said before, we resolved the billing problem 
with HE.net without any problem, and have been very happy with their service. But, if this

is a common mistake (bandwidth measurement via access_log) then the nature of the 
misunderstanding should be made more public so that others can ensure they aren't burned 
by it.

   I apologize for any mistakes in terminology or assumptions in my message. I'm not an 
Apache guru and I don't play one on TV. I'm a 3D graphics programmer.


-- 
      Chris 'Xenon' Hanson | Xenon @ 3D Nature | http://www.3DNature.com/
  "I set the wheels in motion, turn up all the machines, activate the programs,
   and run behind the scenes. I set the clouds in motion, turn up light and sound,
   activate the window, and watch the world go 'round." -Prime Mover, Rush.

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Mime
View raw message