pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maruan Sahyoun <sahy...@fileaffairs.de>
Subject Re: use embedded fonts to write text
Date Thu, 07 Mar 2013 09:48:53 GMT
Hi Lukas,

There are different font formats specified in the PDF specification. They are supported from
within PDFBox through the PDFont class [1] and it's subclasses and the fontbox lib. Not all
of these fonts have 'real' characters but might just be 'curves'. Fonts can also be embedded
or just linked from the PDF. Let's assume the text you are trying to reprint is based on a
TrueType font which is embedded. Then there is something called 'subsetting'. That means that
not all characters of a font are embedded into the PDF but only the characters needed to represent
the current text. Then there is encoding ….

So the code you are presenting only works in certain cases as you already found out. A complete
description of the PDF font handling can be found in section 9.2 of the ISO-32000 (PDF) spec.


I would need to review the samples you attached to give you some more hints. Unfortunately
I won't have tome before start of next week to do so. Maybe other people will provide some
additional information of you. 

Maruan Sahyoun

[1] http://pdfbox.apache.org/apidocs/org/apache/pdfbox/pdmodel/font/PDFont.html


Am 07.03.2013 um 10:23 schrieb Lukas Baab <19lukas@web.de>:

> 
> Hi!
> 
> I want to read the text of a pdf and just write it again on the page.
> 
> In theory this is simple: Use the PDFStreamEngine to get all TextPositions of a page.
The TextPosition has everything you need to write the text with the same font... at the right
place. Code see below. Complete example see attachment.
> 
> Unfortunately it is not that easy: Whether this solution works or not depends on the
font of the text and how the text is embedded into the pdf.
> 
> Questions:
> What type of font/type of font-embedding are supported by PdfBox? (What type is supported
to reuse in the pdf?)
> Do I have to handle different embedded fonts differently? How?
> How can I check whether I can write some text with a font or not?
> 
> I appreciate every kind of advice and answer!
> 
> Thanks
> Lukas
> 
> 
> 
> Attachment:
> Code of TextReprintExample
> exampleFiles:
>  example 1: created with LibreOffice, the whole text is reprinted with wrong characters
>  example 2: created with Word, the text is reprinted correctly, but special characters
( „ and “ ) are not reprinted
> 
> 
> 
> Here the code:
> 
> public void reprintTextTest() throws Exception {
>  PDDocument document = PDDocument.load("E:/80_tmp/test.pdf");
>  List<PDPage> allPages = document.getDocumentCatalog().getAllPages();
> 
>  for (PDPage page : allPages) {
>    List<TextPosition> textPositionsOfPage = getTextPosition(page);
>    writeText(document, page, textPositionsOfPage);
>  }
> 
>  document.save("E:/80_tmp/test-result.pdf");
>  document.close();
> }
> 
> private void writeText(PDDocument document, PDPage page, List<TextPosition> textPositions)
throws IOException {
>  float pageHeight = page.findMediaBox().getHeight();
>  PDPageContentStream pageContentStream = new PDPageContentStream(document, page, true,
true);
>  pageContentStream.setNonStrokingColor(Color.GREEN);
> 
>  for (TextPosition textPosition : textPositions) {
>    float x = textPosition.getX();
>    float y = pageHeight - textPosition.getY();
>    pageContentStream.beginText();
>    pageContentStream.moveTextPositionByAmount(x, y);
>    pageContentStream.setFont(textPosition.getFont(), textPosition.getFontSize());
>    pageContentStream.drawString(textPosition.getCharacter());
>    pageContentStream.endText();
>  }
> 
>  pageContentStream.close();
> }<exampleFiles.zip><TextReprintExample.java>


Mime
View raw message