pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Green <james.mk.gr...@gmail.com>
Subject Re: Unable to get text from this pdf - why?
Date Thu, 14 Nov 2013 16:02:07 GMT
Pretty much confirms our thoughts. Regrettably I don't have Acrobat (only
Reader) here but we did notice the loss of selectable text. Thanks for your
time.


On 14 November 2013 15:42, Gilad Denneboom <gilad.denneboom@gmail.com>wrote:

> It seems that GS converted the text in the file to graphical elements. You
> can see it in Acrobat if you open the Contents panel, and you can also see
> that the text in the file is not selectable, and therefore can't be
> extracted.
> You'll need to look for a solution in GS. It has nothing to do with how
> PDFBox works, as there's just no text to read in that file.
>
>
> On Thu, Nov 14, 2013 at 4:29 PM, James Green <james.mk.green@gmail.com
> >wrote:
>
> > This was created via a fairly obtuse means but suffice it say it should
> > still work.
> >
> > https://www.dropbox.com/s/uaq5sqmlf88108p/sample-from-pdf.pdf
> >
> > This was me creating a document in LibreOffice Writer, exporting that as
> a
> > pdf then loading the pdf into DocumentViewer (Evince, although Adobe
> > Reader) could also be used. This is then printed to a java application
> via
> > the windows PScript dll where the java app runs the received postscript
> > through Ghostscript to get PDF and finally imported into PDFBox.
> >
> > This used to work a few weeks ago, and we are unsure why it does not now.
> > Printing an odt directly from Writer into the Java app works fine.
> >
> > This is using PDFBox 1.8.2.
> >
> > Thanks,
> >
> > James
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message