pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pulkit Kapur <pka...@seas.upenn.edu>
Subject Re: Fwd: Trouble reading IEEE pdf
Date Thu, 02 Feb 2017 22:42:47 GMT
Thank you all. You do a great service.
I am up and running.

Thanks,

Pulkit

On Thu, Feb 2, 2017 at 3:19 PM, Tilman Hausherr <THausherr@t-online.de>
wrote:

> Am 02.02.2017 um 21:12 schrieb Pulkit Kapur:
>
>> I am getting just the headers:
>> "2016 IEEE/RSJ International Conference on Intelligent Robots and Systems
>> (IROS)
>> Daejeon Convention Center
>> October 9-14, 2016, Daejeon, Korea
>> 978-1-5090-3761-2/16/$31.00 ©2016 IEEE 5324
>> 5325
>> 5326
>> 5327
>> 5328
>> 5329
>> 5330
>> 5331
>> "
>> Did use the new file path:
>> javaaddpath('C:\Users\XXX\Downloads\New
>> folder\pdfParseDemo\PDFBox-app-2.0.4.jar')
>> javaaddpath('C:\Users\XXX\Downloads\New
>> folder\pdfParseDemo\FontBox-2.0.4.jar')
>>
>
> I don't know how matlab works. What I mean is to delete the old PDFBox and
> fontbox versions form all directories. Then recompile (if applicable) and
> redeploy your stuff.
>
> If it still doesn't work, use a different directory or a different
> computer. I tested your file and PDFBox extracts quite a lot.
>
> Tilman
>
>
>
>
>
>> On Thu, Feb 2, 2017 at 3:11 PM, Tilman Hausherr <THausherr@t-online.de>
>> wrote:
>>
>> Am 02.02.2017 um 20:26 schrieb Pulkit Kapur:
>>>
>>> Thanks. Thats what i would expect to read.
>>>> Also thanks for pointing to the latest version. I pointed to the
>>>> pdfbox-app-2.0.4.jar and the fontbox-2.0.4.jar files.
>>>>
>>>> Since i want to read over 1000 pdf documents programmatically in
>>>> matlab, i
>>>> am not using the command line, but using the java library in matlab.
>>>> Not sure why i am still *not *getting the text using getText()
>>>> {code}
>>>> pdfdoc = org.pdfbox.pdmodel.PDDocument;
>>>> pdfdoc.close;
>>>> reader = org.pdfbox.util.PDFTextStripper;
>>>>
>>>> % list all the pdf files in the current folder
>>>> % listing = dir('**/*.pdf');
>>>> listing = dir('*.pdf');
>>>>
>>>>       pdfdoc = pdfdoc.load(fullfile(listing(i
>>>> ).folder,listing(i).name));
>>>>       pdfdoc.isEncrypted;
>>>>
>>>>       %% text, with planty of padding
>>>>       pdfstr = reader.getText(pdfdoc);                 %#ok
>>>>       pdfdoc.close
>>>> {\code}
>>>>
>>>>
>>>> Are you getting nothing at all? Or just not all?
>>>
>>> Make sure you cleaned your class path.
>>>
>>>
>>> Tilman
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>
>>>
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message