pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Klaric <ikla...@gmail.com>
Subject PDFBox 2.0.0 and UTF8 chars
Date Sat, 28 Feb 2015 10:52:56 GMT
Hello good PDFBox people,

I am working on a pet project with PDFBox and I encountered what seems to
be an issue with UTF8 chars. If you take the following standard example:

    public static void main(String[] args) {
        try {
            PDDocument document = new PDDocument();
            PDPage page = new PDPage();
            document.addPage( page );
            PDFont font = PDTrueTypeFont.loadTTF(document, new
File("res/Roboto-Regular.ttf"));
            PDPageContentStream contentStream = null;
            contentStream = new PDPageContentStream(document, page);
            contentStream.beginText();
            contentStream.setFont( font, 12 );
            contentStream.moveTextPositionByAmount( 100, 700 );
            contentStream.drawString( "Hello World čćžšđČĆŽŠĐ" );
            contentStream.endText();
            contentStream.close();
            document.save( "/tmp/HelloWorld.pdf");
            document.close();

        } catch (IOException e) {
            e.printStackTrace();
        }
    }

(those weird characters in the drawString method are some pretty common
croatian letters). This is what I get:
java.io.IOException: Error: Could not find referenced cmap stream Identity-H
at org.apache.fontbox.cmap.CMapParser.getExternalCMap(CMapParser.java:418)
at org.apache.fontbox.cmap.CMapParser.parsePredefined(CMapParser.java:84)
at
org.apache.pdfbox.pdmodel.font.CMapManager.getPredefinedCMap(CMapManager.java:54)
at
org.apache.pdfbox.pdmodel.font.PDType0Font.readEncoding(PDType0Font.java:159)
at org.apache.pdfbox.pdmodel.font.PDType0Font.<init>(PDType0Font.java:119)
at org.apache.pdfbox.pdmodel.font.PDType0Font.load(PDType0Font.java:59)
at com.company.Main.main(Main.java:20)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)


Am I doing something wrong? I took the Roboto-Regular font here:
http://www.fontsquirrel.com/fonts/roboto

If I remove the weird Croatian characters, the error remains the same.
However, if I use the PDTrueTypeFont.loadTTF() (which seems to be
deprecated) the same thing works without the Croatian characters. If I put
the Croatian characters back in (and use PDTrueTypeFont), I get

Exception in thread "main" java.lang.IllegalArgumentException: U+010D is
not available in this font's Encoding
at
org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.encode(PDTrueTypeFont.java:261)
at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:268)
at
org.apache.pdfbox.pdmodel.PDPageContentStream.showText(PDPageContentStream.java:316)
at
org.apache.pdfbox.pdmodel.PDPageContentStream.drawString(PDPageContentStream.java:282)
at com.company.Main.main(Main.java:25)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)

I manually looked into the font file and it seems to contain the U+010D
character. What am I doing wrong here?

Thanks,
Ivan

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message