tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pid <...@pidster.com>
Subject Re: PDF Download problem tomcat >= 7.0.27
Date Wed, 01 Aug 2012 09:50:24 GMT
On 01/08/2012 08:54, André Warnier wrote:
> Konstantin Kolinko wrote:
>> 2012/8/1 Jose María Zaragoza <demablogia@gmail.com>:
>>>> The Content-Length header in the above 206 response is not from Tomcat.
>>>>
>>>> Tomcat's DefaultServlet does not calculate the whole size of the parts
>>>> and does not set content-length, and the file size is much more than
>>>> fits into the buffer.
>>>> So it would use  Transfer-Encoding: chunked  in its response and not
>>>> the one that you cited.
>>>> There must be some proxy in the way that buffers the data and sends
>>>> them as one response instead of chunks. HTTPD? Was there some option
>>>> in it that disables chunked encoding when interacting with IE?
>>>
>>> Well, i don't know so much, but that doesn't have to do with chunked
>>> encoding, but Partial Content support, right ?
>>> And partial content is requested by client (IE) if Content-length is
>>> very big ( I guess ... )
>>> Maybe, IE requests a PDF file (GET) and  if it sees a Content-length
>>> very big , cuts downloading and re-send a GET request with a range of
>>> bytes.
>>>
>>> Chrome looks to perform something like that behaviour
>>>
>>
>> 1. I suspect that the content is requested not by IE, but by the Adobe
>> Acrobat plugin.
>>
>> The "User-Agent" header says that it was IE6,  but it is hard to
>> imagine why the browser by itself would request that strange bytes
>> range, asking for the tail of the file first. So there is something
>> else that uses the browser to perform the request.
>>
> +1
> Talking about PDF files, there is a possible good reason for such a
> behaviour.
> 
> A PDF file is not just a sequential text-like file.  It is more like an
> indexed file containing tables of pointers which points to more or less
> randomly organised chunks of data inside the file. And, as per Adobe PDF
> 1.7 reference :
> 
> 3.4.4 File Trailer
> The trailer of a PDF file enables an application reading the file to
> quickly find the cross-reference table and certain special objects.
> Applications should read a PDF file from its end. The last line of the
> file contains only the end-of-file marker, %%EOF. (See implementation
> note 18 in Appendix H.) The two preceding lines contain the keyword
> startxref and the byte offset from the beginning of the file to the
> beginning of the xref keyword in the last cross-reference section.
> etc..
> ...
> And Note 18 in Appendix H essentially says that Acrobat reader is
> "tolerant" with respect to the above, and accepts a PDF if the %%EOF
> marker is located within the last 1024 bytes of the file.
> 
> So, it is not beyond belief to imagine that a smart browser PDF plugin
> would first request the last chunk of the file, in order to retrieve
> pointers to the contents of the first page of the PDF, so that it could
> quickly retrieve the range of bytes corresponding to this first page, so
> that it could quickly display this first page into the browser window,
> while later retrieving the rest on-demand (as the user scrolls). (*)
> 
> And if this is not the real explanation for the behaviour we are seeing,
> at least it is a clever one.
> 
> Now how this all works in conjunction with the behaviour of HTTP
> proxies/gateways with respect to Range requests and buffering, is left
> as an exercise for the reader.
> (Who can start by trying to understand
> http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35)
> But that there would exist a couple of obscure bugs somewhere in there,
> which show up only in very specific circumstances, is not beyond belief
> either.
> 
> 
> (*) The attentive reader will have noticed that there is a possible flaw
> in this explanation : in the case at hand, the browser/plugin requests 2
> chunks of bytes in the Range request : the end-of-file chunk, but also a
> chunk in the middle.  How does it already know which second Range to
> request ?

The PDF plugin is a PITA.

It *does* request ranges, which can be a little painful; I found this
out the hard way with some dynamically rendered PDFs.


p

> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
> 


-- 

[key:62590808]


Mime
View raw message