pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Candace Bain <candace.b...@gmail.com>
Subject Re: Text extract/replace using PDFBox
Date Tue, 25 Feb 2014 14:47:51 GMT
Ok, our designer figured this out.  Apparently if you have the letter f
followed by the letter i the string will not be not stored as ascii text.
 Apparently this an artifact of typesetting:

http://en.wikipedia.org/wiki/Typographic_ligature

When I change the PDF template so the strings I need to replace do not
contain "fi" then they are saved as ascii text and I'm able to
programatically replace them.

That's the first issue I've ever looked at that had to do with printing
presses.

Thanks for the PDFbox library, it's very useful!

Best regards,

Candace


On Mon, Feb 24, 2014 at 1:26 PM, Candace Bain <candace.bain@gmail.com>wrote:

> I'm using PDFBox to programatically create a PDF file by finding and
> replacing text in a template PDF file.  The template was created by someone
> in a different department.  This worked correctly in a previous version of
> the software, but we've added some new text for the next version of the
> software that is not working.
>
> The problem seems to be that the text in the previous version of the file
> is using the Imago font with Ansi encoding, whereas the text that was added
> to the newer version of the file is using the Imago font with Identiy-H
> encoding.
>
> This file was created with Adobe InDesign, and I am not familiar enough
> with the product to know how to ensure that the fonts in the exported PDF
> file only use Ansi encoding.  Is this possible, or is it possible to
> process the template with another application to make sure we're using a
> font with Ansi encoding?
>
> I've attached the template file that is causing the problem in case that
> is useful,
>
> Best regards,
>
> Candace
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message