pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jochen Hebbrecht <jochenhebbre...@gmail.com>
Subject Re: How does PDFBox extract text from a PDF?
Date Wed, 11 Jul 2012 09:48:38 GMT
Hehe, no no :-). I just read chapter 9 of http://wwwimages.adobe.com/www.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/PDF32000_2008.pdf.
The spec is really well written! Thanks for the tip!

On 11 Jul 2012, at 01:44, Craig Ringer wrote:

> On 07/11/2012 02:32 AM, Jeremias Maerki wrote:
>> Jochen, fire up PDFBox's PDFDebugger [1] and load a few PDFs and browse
>> through the object tree. Look around. That'll give you a feeling of
>> what's in a PDF. Then download the PDF specification. It's not written
>> in Hieroglyphs or Klingon. ;-)
> In fact, it's an amazingly clear and readable specification.
> Maybe Jochen has been working with Java Server Faces (JSF2) or something recently, and
is thus justifiably afraid to go /anywhere near/  the spec. If you've experienced a spec like
that, wanting to avoid all of them is understandable.
> --
> Craig Ringer

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message