pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gilad Denneboom <gilad.denneb...@gmail.com>
Subject Re: Unable to get text from this pdf - why?
Date Thu, 14 Nov 2013 15:42:33 GMT
It seems that GS converted the text in the file to graphical elements. You
can see it in Acrobat if you open the Contents panel, and you can also see
that the text in the file is not selectable, and therefore can't be
extracted.
You'll need to look for a solution in GS. It has nothing to do with how
PDFBox works, as there's just no text to read in that file.


On Thu, Nov 14, 2013 at 4:29 PM, James Green <james.mk.green@gmail.com>wrote:

> This was created via a fairly obtuse means but suffice it say it should
> still work.
>
> https://www.dropbox.com/s/uaq5sqmlf88108p/sample-from-pdf.pdf
>
> This was me creating a document in LibreOffice Writer, exporting that as a
> pdf then loading the pdf into DocumentViewer (Evince, although Adobe
> Reader) could also be used. This is then printed to a java application via
> the windows PScript dll where the java app runs the received postscript
> through Ghostscript to get PDF and finally imported into PDFBox.
>
> This used to work a few weeks ago, and we are unsure why it does not now.
> Printing an odt directly from Writer into the Java app works fine.
>
> This is using PDFBox 1.8.2.
>
> Thanks,
>
> James
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message