pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremias Maerki <...@jeremias-maerki.ch>
Subject Re: PDF headers/footers?
Date Fri, 25 Jul 2008 15:08:08 GMT
Well, the painting part of PDF really doesn't have a concept of headers
and footers. There's just text and graphics. Only with tagged PDF
there's some additional information identifying a part of the document
to be part of a header or footer. But I don't think that PDFBox can use
that information, yet. And your PDF would have to be tagged in the first

What I've seen in PDFBox is text extraction by area. If you know where
your header and footer is on the page you can give PDFTextStripperByArea
an area to restrict the search to. There's sample code:
ExtractTextByArea in the PDFBox dist.

Disclaimer: I've never tried this before so it might now work.


On 25.07.2008 16:42:02 Dmitry Goldenberg wrote:
> That's kind of what I suspected.  Although, in Acrobat, you can do
> Document->Add header and footer and it really does allow you to add 3
> header values (left, center, right) and 3 footer values (left, center,
> right).  I suspect they all get treated as text internally.  Or maybe
> not? ...
> ________________________________________
> From: Jukka Zitting [jukka.zitting@gmail.com]
> Sent: Friday, July 25, 2008 8:35 AM
> To: pdfbox-users@incubator.apache.org
> Subject: Re: PDF headers/footers?
> Hi,
> On Wed, Jul 23, 2008 at 9:12 PM, Dmitry Goldenberg
> <DGoldenberg@attivio.com> wrote:
> > Can anyone tell me if there is a way in PDFBox to extract the headers and footers
> > for a PDF document and if so, what it is (i.e. a code snippet would be much appreciated).
> AFAIK that's not possible in general as (AFAIK) internally there is no
> "header" or "footer" concept in the PDF format.
> You could perhaps achieve something like that if you can manually (or
> with some heuristics) specify the header and footer areas within a
> page.
> BR,
> Jukka Zitting

Jeremias Maerki

View raw message