pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bernard Segonnes <bsegon...@gmail.com>
Subject PDFBox 0.8.0 : text extraction
Date Mon, 28 Dec 2009 14:31:01 GMT

I am new to this list, even if I use PDFBox for some time now.

I have founded strange behaviour.  Text extraction seems to work on some
PDF, and not on other.  Even if they all use standard ASCII char. and the
text can be selected (then not an image) using Acrobat.
Or only part of a page can be extracted.

Can someone explain why ?

I have big pdf (5Mo) which doesn't work.  Even extracting text out of
excel.pdf   provided in the PDFBox source package doesn't work.

I have no resources (time) , nor knowledge to help in debugging PDFBox.  But
I need to extract text from PDF files.  Then : I am doing a lot of tests.


Bernard Segonnes


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message