pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maruan Sahyoun <sahy...@fileaffairs.de>
Subject Re: pdfbox 1.8.5 PDFParser constructor issue?
Date Mon, 07 Nov 2016 15:55:15 GMT
Hi,

> Am 07.11.2016 um 16:37 schrieb Mark Bobick, CTO <m.bobick@correlationconcepts.com>:
> 
> Andreas,
> 
> Updated to 1.8.12
> 
> This time, I found the inputStream contents in parser.pdfSource.in.in.buf  
> However, pdfSource.buf is all zeros and Pushback is all zeros.  When nextObj
> call is made, b.length is 16, but all zeros. One other point pdf version
> returned is 1.5  Again, inputStream is confirmed to exist and is correct.
> 
> Code still fails, even though seems simple:
> 
> PDDocumentInformation metadata;
> PDFTextStripper stripper;
> PDDocument document = null;
> PDFParser parser;
> InputStream inputStream;
> 
> String text;
> parser = new PDFParser(inputStream);
> parser.parse();
> document = parser.getPDDocument();
> stripper = new PDFTextStripper();
> text = stripper.getText(document);

could you try

PDDocument document = PDDocument.loadNonSeq(inputStream, null);
PDTextStripper stripper = new PDFTextStripper();
String text = stripper.getText(document);

No need to work with the parser directly.

If that sill doesn't work please upload a sample PDF to a public location.

BR
Maruan

> 
> Please advise.
> 
> Thanks & Regards,
> 
> -mark bobick
> 
> -----Original Message-----
> From: Andreas Lehmkuehler [mailto:andreas@lehmi.de] 
> Sent: Monday, November 07, 2016 12:56 AM
> To: users@pdfbox.apache.org
> Subject: Re: pdfbox 1.8.5 PDFParser constructor issue?
> 
> Hi,
> 
> Am 06.11.2016 um 23:41 schrieb Mark Bobick, CTO:
>> Problem with 1.8.5 PDFParser(InputStream input) constructor:  Verified 
>> that input exists, and has correct length (>50Kb) when passed into the 
>> parser constructor.  However, inspection of parser object at next 
>> statement
>> parser.parse() shows that pdfSource is all zeros [0,0,0,0,0.]  so 
>> nothing comes back from parser into the Holder.  Pushback is also all
> zeros.
>> 
>> 
>> 
>> Note this is attempt to parse PDF documents that were created on 
>> wintel machines, while the pdfbox is being executed on a Fedora 20
> machine.
>> 
>> 
>> 
>> Any suggestions?
> Update to a more recent version, 2.0.3 or at least to 1.8.12. If it doesn't
> work either, post the relevant part of the code you are using
> 
> BR
> Andreas
> 
>> Regards,
>> 
>> 
>> 
>> -mark bobick
>> <http://www.linkedin.com/pub/mark-bobick/2/306/816/> LinkedIn
>> 
>> 
>> 
>> CTO, Correlation Concepts
>> 
>> <http://www.correlationconcepts.com/> www.correlationconcepts.com
>> 
>> 2880 David Walker Dr. #407
>> 
>> Eustis, Florida  32726
>> 
>> 702.882.5664
>> 
>> 
>> 
>> "We will find a way, or we will make one." - Hannibal
>> 
>> 
>> 
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message