pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Hewson <j...@jahewson.com>
Subject Re: font errors when reading PDF (not writing)
Date Wed, 25 Feb 2015 21:27:58 GMT
Are you running on a headless system, such as a server? If so, you probably don’t have any
fonts installed. Even though you’re just doing text extraction, this matters because the
dimensions of the characters need to be taken into account and many PDFs do not embed the
fonts which they depend on.

At a bare minimum I’d recommend installing the liberation fonts and whichever Microsoft
fonts are available in your distribution’s package manager.

— John

> On 25 Feb 2015, at 06:12, Juan M Uys <opyate@gmail.com> wrote:
> Hello,
> I'm extracting text from PDFs using PDFTextStripperByArea and get a  lot of
> these in the log:
> Feb 25, 2015 2:01:44 PM org.apache.pdfbox.pdmodel.font.ExternalFonts
> getTrueTypeFallbackFont
> SEVERE: No TTF fallback font for 'Helvetica'
> Feb 25, 2015 2:01:44 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>
> WARNING: Using fallback font 'LiberationSans' for 'ArialMT'
> I've searched the documentation for font-related advice, which seems to
> pertain to WRITING PDFs, whereas I'm merely extracting text.
> Please let me know how to get around this problem.
> Do I need to install extra font packages?
> If so, how? Where from?
> At the very least, I'd like to know how to remove these statements from my
> log. (I've tried throwing logback.xml and log4j.properties into my
> resources folder, setting package org.apache.pdfbox to INFO, to no avail)
> The system running my extractor code is stock Ubuntu 14.04 with Azul
> openjdk 7 (see
> https://registry.hub.docker.com/u/azul/zulu-openjdk/dockerfile/)
> Thanks,
> Juan

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message