pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Klaric <ikla...@gmail.com>
Subject Re: PDFBox 2.0.0 and UTF8 chars
Date Sat, 07 Mar 2015 08:14:52 GMT
Thank you all, will look into it and report back as soon as I do.

On Thu, Mar 5, 2015 at 8:45 PM John Hewson <john@jahewson.com> wrote:

> Our ant build is obsolete, use maven instead:
>
> mvn clean install
>
> I don’t know what the status of the ant build is, if perhaps it will be
> removed? We don’t really need two build systems...
>
> — John
>
> > On 4 Mar 2015, at 11:23, Ivan Klaric <iklaric@gmail.com> wrote:
> >
> > OK, I'll focus on the PDType0Font.load version then. I build my pdfbox
> like
> > this:
> >
> > svn update && ant clean && ant build
> >
> > and then copy fontbox-2.0.0.jar & pdfbox-2.0.0.jar from the target folder
> > to my projects lib folder. This stack trace:
> > java.io.IOException: Error: Could not find referenced cmap stream
> Identity-H
> > at org.apache.fontbox.cmap.CMapParser.getExternalCMap(
> CMapParser.java:418)
> > at org.apache.fontbox.cmap.CMapParser.parsePredefined(
> CMapParser.java:84)
> > at org.apache.pdfbox.pdmodel.font.CMapManager.
> getPredefinedCMap(CMapManager.
> > java:54)
> > at org.apache.pdfbox.pdmodel.font.PDType0Font.readEncoding(
> > PDType0Font.java:159)
> > at org.apache.pdfbox.pdmodel.font.PDType0Font.<init>(
> PDType0Font.java:119)
> > at org.apache.pdfbox.pdmodel.font.PDType0Font.load(PDType0Font.java:59)
> > at com.company.Main.main(Main.java:20)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at sun.reflect.NativeMethodAccessorImpl.invoke(
> > NativeMethodAccessorImpl.java:62)
> > at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> > DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:483)
> > at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
> >
> > is what I get when using PDType0Font.load() and the jars I get out of the
> > ant. When I use pdfbox.jar from
> > https://repository.apache.org/content/groups/snapshots/org/
> apache/pdfbox/pdfbox/2.0.0-SNAPSHOT/
> >
> > I get the same error. Note that there is no fontbox.jar in that folder,
> so
> > I get an exception that points to the fact that fontbox.jar is missing:
> > Exception in thread "main" java.lang.NoClassDefFoundError:
> > org/apache/fontbox/ttf/TTFParser
> > at
> > org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.buildFontFile2(
> TrueTypeEmbedder.java:90)
> > at
> > org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.<init>(
> TrueTypeEmbedder.java:72)
> > at
> > org.apache.pdfbox.pdmodel.font.PDTrueTypeFontEmbedder.<
> init>(PDTrueTypeFontEmbedder.java:56)
> > at
> > org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.<init>(
> PDTrueTypeFont.java:184)
> > at
> > org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.loadTTF(
> PDTrueTypeFont.java:81)
> > at com.company.Main.main(Main.java:20)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> > sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
> > at
> > sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:483)
> > at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
> > Caused by: java.lang.ClassNotFoundException:
> > org.apache.fontbox.ttf.TTFParser
> > at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
> > at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
> > at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> > at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> > ... 11 more
> >
> >
> > When I use the fontbox.jar I got from the ant build above, the error is
> > exactly the same as when using the jars I built with ant. Also, note that
> > it seems to work fine when I remove the croatian characters from the
> string
> > and add e.g. german ones (stuff like *ü, ä, ß...).*
> >
> > Thanks,
> > Ivan
> >
> > On Wed, Mar 4, 2015 at 7:07 PM John Hewson <john@jahewson.com> wrote:
> >
> >> Hi,
> >>
> >>> On 28 Feb 2015, at 02:52, Ivan Klaric <iklaric@gmail.com> wrote:
> >>>
> >>> Hello good PDFBox people,
> >>>
> >>> I am working on a pet project with PDFBox and I encountered what seems
> to
> >>> be an issue with UTF8 chars. If you take the following standard
> example:
> >>>
> >>>   public static void main(String[] args) {
> >>>       try {
> >>>           PDDocument document = new PDDocument();
> >>>           PDPage page = new PDPage();
> >>>           document.addPage( page );
> >>>           PDFont font = PDTrueTypeFont.loadTTF(document, new
> >>> File("res/Roboto-Regular.ttf"));
> >>>           PDPageContentStream contentStream = null;
> >>>           contentStream = new PDPageContentStream(document, page);
> >>>           contentStream.beginText();
> >>>           contentStream.setFont( font, 12 );
> >>>           contentStream.moveTextPositionByAmount( 100, 700 );
> >>>           contentStream.drawString( "Hello World čćžšđČĆŽŠĐ" );
> >>>           contentStream.endText();
> >>>           contentStream.close();
> >>>           document.save( "/tmp/HelloWorld.pdf");
> >>>           document.close();
> >>>
> >>>       } catch (IOException e) {
> >>>           e.printStackTrace();
> >>>       }
> >>>   }
> >>>
> >>> (those weird characters in the drawString method are some pretty common
> >>> croatian letters). This is what I get:
> >>> java.io.IOException: Error: Could not find referenced cmap stream
> >> Identity-H
> >>> at org.apache.fontbox.cmap.CMapParser.getExternalCMap(
> >> CMapParser.java:418)
> >>> at org.apache.fontbox.cmap.CMapParser.parsePredefined(
> >> CMapParser.java:84)
> >>> at
> >>> org.apache.pdfbox.pdmodel.font.CMapManager.
> >> getPredefinedCMap(CMapManager.java:54)
> >>> at
> >>> org.apache.pdfbox.pdmodel.font.PDType0Font.readEncoding(
> >> PDType0Font.java:159)
> >>> at org.apache.pdfbox.pdmodel.font.PDType0Font.<init>(
> >> PDType0Font.java:119)
> >>> at org.apache.pdfbox.pdmodel.font.PDType0Font.load(
> PDType0Font.java:59)
> >>> at com.company.Main.main(Main.java:20)
> >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >>> at
> >>> sun.reflect.NativeMethodAccessorImpl.invoke(
> >> NativeMethodAccessorImpl.java:62)
> >>> at
> >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(
> >> DelegatingMethodAccessorImpl.java:43)
> >>> at java.lang.reflect.Method.invoke(Method.java:483)
> >>> at com.intellij.rt.execution.application.AppMain.main(
> AppMain.java:134)
> >>
> >> There’s something wrong with how you built PDFBox, as this error means
> it
> >> can’t find
> >> resources which we ship in the jar file. Try doing a "mvn clean install”
> >> or using a snapshot
> >> jar instead. (Did you build using an IDE or Ant perhaps?)
> >>
> >>> Am I doing something wrong? I took the Roboto-Regular font here:
> >>> http://www.fontsquirrel.com/fonts/roboto
> >>>
> >>> If I remove the weird Croatian characters, the error remains the same.
> >>> However, if I use the PDTrueTypeFont.loadTTF() (which seems to be
> >>> deprecated) the same thing works without the Croatian characters. If I
> >> put
> >>> the Croatian characters back in (and use PDTrueTypeFont), I get
> >>>
> >>> Exception in thread "main" java.lang.IllegalArgumentException: U+010D
> is
> >>> not available in this font's Encoding
> >>> at
> >>> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.encode(
> >> PDTrueTypeFont.java:261)
> >>> at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:268)
> >>> at
> >>> org.apache.pdfbox.pdmodel.PDPageContentStream.showText(
> >> PDPageContentStream.java:316)
> >>> at
> >>> org.apache.pdfbox.pdmodel.PDPageContentStream.drawString(
> >> PDPageContentStream.java:282)
> >>> at com.company.Main.main(Main.java:25)
> >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >>> at
> >>> sun.reflect.NativeMethodAccessorImpl.invoke(
> >> NativeMethodAccessorImpl.java:62)
> >>> at
> >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(
> >> DelegatingMethodAccessorImpl.java:43)
> >>> at java.lang.reflect.Method.invoke(Method.java:483)
> >>> at com.intellij.rt.execution.application.AppMain.main(
> AppMain.java:134)
> >>
> >> PDTrueTypeFont.loadTTF is deprecated and only supports ANSI. For full
> >> Unicode support,
> >> use PDType0Font.load, as explained in the @deprecated JavaDoc tag.
> >>
> >>> I manually looked into the font file and it seems to contain the U+010D
> >>> character. What am I doing wrong here?
> >>>
> >>> Thanks,
> >>> Ivan
> >>
> >> — John
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> >> For additional commands, e-mail: users-help@pdfbox.apache.org
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message