tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rainer Jung <>
Subject Re: PDF Download problem tomcat >= 7.0.27
Date Wed, 01 Aug 2012 17:12:05 GMT
On 01.08.2012 09:54, André Warnier wrote:
> Konstantin Kolinko wrote:
>> 2012/8/1 Jose María Zaragoza <>:
>>>> The Content-Length header in the above 206 response is not from Tomcat.
>>>> Tomcat's DefaultServlet does not calculate the whole size of the parts
>>>> and does not set content-length, and the file size is much more than
>>>> fits into the buffer.
>>>> So it would use  Transfer-Encoding: chunked  in its response and not
>>>> the one that you cited.
>>>> There must be some proxy in the way that buffers the data and sends
>>>> them as one response instead of chunks. HTTPD? Was there some option
>>>> in it that disables chunked encoding when interacting with IE?
>>> Well, i don't know so much, but that doesn't have to do with chunked
>>> encoding, but Partial Content support, right ?
>>> And partial content is requested by client (IE) if Content-length is
>>> very big ( I guess ... )
>>> Maybe, IE requests a PDF file (GET) and  if it sees a Content-length
>>> very big , cuts downloading and re-send a GET request with a range of
>>> bytes.
>>> Chrome looks to perform something like that behaviour
>> 1. I suspect that the content is requested not by IE, but by the Adobe
>> Acrobat plugin.
>> The "User-Agent" header says that it was IE6,  but it is hard to
>> imagine why the browser by itself would request that strange bytes
>> range, asking for the tail of the file first. So there is something
>> else that uses the browser to perform the request.
> +1
> Talking about PDF files, there is a possible good reason for such a
> behaviour.
> A PDF file is not just a sequential text-like file.  It is more like an
> indexed file containing tables of pointers which points to more or less
> randomly organised chunks of data inside the file. And, as per Adobe PDF
> 1.7 reference :
> 3.4.4 File Trailer
> The trailer of a PDF file enables an application reading the file to
> quickly find the cross-reference table and certain special objects.
> Applications should read a PDF file from its end. The last line of the
> file contains only the end-of-file marker, %%EOF. (See implementation
> note 18 in Appendix H.) The two preceding lines contain the keyword
> startxref and the byte offset from the beginning of the file to the
> beginning of the xref keyword in the last cross-reference section.
> etc..
> ...
> And Note 18 in Appendix H essentially says that Acrobat reader is
> "tolerant" with respect to the above, and accepts a PDF if the %%EOF
> marker is located within the last 1024 bytes of the file.
> So, it is not beyond belief to imagine that a smart browser PDF plugin
> would first request the last chunk of the file, in order to retrieve
> pointers to the contents of the first page of the PDF, so that it could
> quickly retrieve the range of bytes corresponding to this first page, so
> that it could quickly display this first page into the browser window,
> while later retrieving the rest on-demand (as the user scrolls). (*)
> And if this is not the real explanation for the behaviour we are seeing,
> at least it is a clever one.
> Now how this all works in conjunction with the behaviour of HTTP
> proxies/gateways with respect to Range requests and buffering, is left
> as an exercise for the reader.
> (Who can start by trying to understand
> But that there would exist a couple of obscure bugs somewhere in there,
> which show up only in very specific circumstances, is not beyond belief
> either.
> (*) The attentive reader will have noticed that there is a possible flaw
> in this explanation : in the case at hand, the browser/plugin requests 2
> chunks of bytes in the Range request : the end-of-file chunk, but also a
> chunk in the middle.  How does it already know which second Range to
> request ?

Adobe calls the range requests in the context of acrobat "fast web 
view". When you generate a PDF you can choose whether you want to 
support it or not. I guess that at least there will be a byte range 
index giving the byte ranges for each page at the beginning of the 
document. Usually Acrobat then just gets the first page plus the index. 
If you switch to a different page, then it only loads the byte range 
needed for that page.

How does it know the second Range? Perhaps it already did another 
request in front to collect all needed index data.



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message