pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lorena Leishman <lorenaleish...@yahoo.com.INVALID>
Subject Re: PDFTextStripper not working for some pdfs
Date Wed, 03 Jun 2015 17:44:34 GMT
Thanks for your reply. I did some more testing and it looks like it was just that one file
having the problem. All other files with the same formatting were processed properly. We think
something went wrong during the downloading process. 
Thanks again,

      From: John Hewson <john@jahewson.com>
 To: users@pdfbox.apache.org; Lorena Leishman <lorenaleishman@yahoo.com> 
 Sent: Tuesday, June 2, 2015 1:20 AM
 Subject: Re: PDFTextStripper not working for some pdfs

> On 1 Jun 2015, at 18:00, Lorena Leishman <lorenaleishman@yahoo.com.INVALID> wrote:
> Hi,
> I have been using PDFTextStripper to get the text from different pdfs and it has worked
great. Today I tried it on a new document and it didn't return any text. I just got a file
with about 20 blank lines. I didn't get any error messages.  I can't upload the pdf because
it has a lot of personal information. Any ideas why this would be happening?
> Lorena

Try opening the PDF with Acrobat and copy & paste the text, do you get a better result?
Also try using the latest 2.0 trunk of PDFBox from SVN.

— John

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message