tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier>
Subject Re: PDF Download problem tomcat >= 7.0.27
Date Wed, 01 Aug 2012 07:54:20 GMT
Konstantin Kolinko wrote:
> 2012/8/1 Jose María Zaragoza <>:
>>> The Content-Length header in the above 206 response is not from Tomcat.
>>> Tomcat's DefaultServlet does not calculate the whole size of the parts
>>> and does not set content-length, and the file size is much more than
>>> fits into the buffer.
>>> So it would use  Transfer-Encoding: chunked  in its response and not
>>> the one that you cited.
>>> There must be some proxy in the way that buffers the data and sends
>>> them as one response instead of chunks. HTTPD? Was there some option
>>> in it that disables chunked encoding when interacting with IE?
>> Well, i don't know so much, but that doesn't have to do with chunked
>> encoding, but Partial Content support, right ?
>> And partial content is requested by client (IE) if Content-length is
>> very big ( I guess ... )
>> Maybe, IE requests a PDF file (GET) and  if it sees a Content-length
>> very big , cuts downloading and re-send a GET request with a range of
>> bytes.
>> Chrome looks to perform something like that behaviour
> 1. I suspect that the content is requested not by IE, but by the Adobe
> Acrobat plugin.
> The "User-Agent" header says that it was IE6,  but it is hard to
> imagine why the browser by itself would request that strange bytes
> range, asking for the tail of the file first. So there is something
> else that uses the browser to perform the request.
Talking about PDF files, there is a possible good reason for such a behaviour.

A PDF file is not just a sequential text-like file.  It is more like an indexed file 
containing tables of pointers which points to more or less randomly organised chunks of 
data inside the file. And, as per Adobe PDF 1.7 reference :

3.4.4 File Trailer
The trailer of a PDF file enables an application reading the file to
quickly find the cross-reference table and certain special objects.
Applications should read a PDF file from its end. The last line of the
file contains only the end-of-file marker, %%EOF. (See implementation
note 18 in Appendix H.) The two preceding lines contain the keyword
startxref and the byte offset from the beginning of the file to the
beginning of the xref keyword in the last cross-reference section.
And Note 18 in Appendix H essentially says that Acrobat reader is "tolerant" with respect

to the above, and accepts a PDF if the %%EOF marker is located within the last 1024 bytes

of the file.

So, it is not beyond belief to imagine that a smart browser PDF plugin would first request

the last chunk of the file, in order to retrieve pointers to the contents of the first 
page of the PDF, so that it could quickly retrieve the range of bytes corresponding to 
this first page, so that it could quickly display this first page into the browser window,

while later retrieving the rest on-demand (as the user scrolls). (*)

And if this is not the real explanation for the behaviour we are seeing, at least it is a

clever one.

Now how this all works in conjunction with the behaviour of HTTP proxies/gateways with 
respect to Range requests and buffering, is left as an exercise for the reader.
(Who can start by trying to understand
But that there would exist a couple of obscure bugs somewhere in there, which show up only

in very specific circumstances, is not beyond belief either.

(*) The attentive reader will have noticed that there is a possible flaw in this 
explanation : in the case at hand, the browser/plugin requests 2 chunks of bytes in the 
Range request : the end-of-file chunk, but also a chunk in the middle.  How does it 
already know which second Range to request ?

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message