pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pulkit Kapur <pka...@seas.upenn.edu>
Subject Re: Fwd: Trouble reading IEEE pdf
Date Thu, 02 Feb 2017 20:12:14 GMT
I am getting just the headers:
"2016 IEEE/RSJ International Conference on Intelligent Robots and Systems
(IROS)
Daejeon Convention Center
October 9-14, 2016, Daejeon, Korea
978-1-5090-3761-2/16/$31.00 ©2016 IEEE 5324
5325
5326
5327
5328
5329
5330
5331
"
Did use the new file path:
javaaddpath('C:\Users\XXX\Downloads\New
folder\pdfParseDemo\PDFBox-app-2.0.4.jar')
javaaddpath('C:\Users\XXX\Downloads\New
folder\pdfParseDemo\FontBox-2.0.4.jar')

On Thu, Feb 2, 2017 at 3:11 PM, Tilman Hausherr <THausherr@t-online.de>
wrote:

> Am 02.02.2017 um 20:26 schrieb Pulkit Kapur:
>
>> Thanks. Thats what i would expect to read.
>> Also thanks for pointing to the latest version. I pointed to the
>> pdfbox-app-2.0.4.jar and the fontbox-2.0.4.jar files.
>>
>> Since i want to read over 1000 pdf documents programmatically in matlab, i
>> am not using the command line, but using the java library in matlab.
>> Not sure why i am still *not *getting the text using getText()
>> {code}
>> pdfdoc = org.pdfbox.pdmodel.PDDocument;
>> pdfdoc.close;
>> reader = org.pdfbox.util.PDFTextStripper;
>>
>> % list all the pdf files in the current folder
>> % listing = dir('**/*.pdf');
>> listing = dir('*.pdf');
>>
>>      pdfdoc = pdfdoc.load(fullfile(listing(i).folder,listing(i).name));
>>      pdfdoc.isEncrypted;
>>
>>      %% text, with planty of padding
>>      pdfstr = reader.getText(pdfdoc);                 %#ok
>>      pdfdoc.close
>> {\code}
>>
>>
>
> Are you getting nothing at all? Or just not all?
>
> Make sure you cleaned your class path.
>
>
> Tilman
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message