pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Kehl <walter.k...@outlook.com>
Subject RE: Downloadind a pdf file doesn't work
Date Sat, 13 Dec 2014 17:41:19 GMT
Hi John, Tilman, 

thanks for the reply. Here is some additional information:

- the http client I am using to get the input stream already has a user
agent set. Also I have downloaded with PDF box already lots of PDF files
where there never was a problem. 
- when I try to load the document remotely from the URL, I get the following
error messages:
  18:34:32 WARN  BaseParser           :: Specified stream length 66346 is
wrong. Fall back to reading stream until 'endstream'.
  18:34:35 WARN  XrefTrailerResolver  :: Did not found XRef object at
specified startxref position 0
- I have written the input stream directly to a file and it was a valid PDF.
It could load it both with an external tool and with PDFBox. 

Yes, of course I could always download a file first to a temp file and then
load it into PDFBox. But I think the direct way is more elegant and faster.
I have also debugged a little bit into the code and to me it doesn't look
like PDFBox uses a temporary file, but rather reads directly from the input
stream.... but I might be wrong.

Anyway, thanks for providing such a good free software!

Best
Walter

-----Original Message-----
From: John Hewson [mailto:john@jahewson.com] 
Sent: Freitag, 12. Dezember 2014 18:57
To: users@pdfbox.apache.org
Subject: Re: Downloadind a pdf file doesn't work

Good point Tilman. Walter, try saving writing the InputStream to a File and
check that it's a valid PDF.

-- John

> On 12 Dec 2014, at 09:50, Tilman Hausherr <THausherr@t-online.de> wrote:
> 
> This sounds more like a http problem. Try setting a user agent like a
browser.
> 
> https://stackoverflow.com/questions/2529682/setting-user-agent-of-a-ja
> va-urlconnection
> 
> Tilman
> 
> Am 12.12.2014 um 11:53 schrieb Walter Kehl:
>> Hi all,
>> 
>>  
>> I have the following situation:
>> 
>>  
>> I am loading with PdfBox files from the internet with the call
>> 
>> PDDocument document = PDDocument.load( inputStream );
>> 
>>  
>> So far it has worked nicely, but I have problems with this file :
>> http://esa.un.org/unpd/wup/PressRelease/WUP2014_PressRelease.pdf
>> 
>>  
>> After I load it, it is empty, and the call 
>> document.getNumberOfPages() returns 0.
>> 
>> However when I download the file manually and then load it into 
>> PdfBox, then everything is fine.
>> 
>>  
>> Any idea what could be happening? I am currently using PdfBox 1.8.5.
>> 
>>  
>> Thanks and Best Regards
>> 
>> Walter
>> 
>>  
>>  
>>  
>> 
> 


Mime
View raw message