pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Hewson (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (PDFBOX-922) True type PDFont subclass only supports WinAnsiEncoding (hardcoded!)
Date Mon, 02 Jun 2014 18:02:04 GMT

    [ https://issues.apache.org/jira/browse/PDFBOX-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14015630#comment-14015630
] 

John Hewson edited comment on PDFBOX-922 at 6/2/14 6:01 PM:
------------------------------------------------------------

It shouldn't make any difference - the ToUnicode map defines the mapping to unicode, not the
character codes embedded in the content stream. Of course there's no harm in using UTF-16
(with the BOM you mention) instead of PDFDocEncoding, but be aware that PDF readers use the
ToUnicode map, as long as it's present.

I should add that it isn't always possible to use Unicode for glyph encoding, because not
every glyph has a a unique unicode point. For example, a font may include a set of normal
characters and a set of small caps characters, but only one of these can map to the unicode
"A" character. The other is forced to map to some other code, which is why GIDs are typically
used with TrueType fonts, because we can guarantee that each glyph has a unique GID and the
ToUnicode map can be used to map both the normal "A" and small cap "A" to Unicode "A".

In other words: Unicode code point != glyph


was (Author: jahewson):
It shouldn't make any difference - the ToUnicode map defines the mapping to unicode, not the
character codes embedded in the content stream. Of course there's no harm in using UTF-16
(with the BOM you mention) instead of PDFDocEncoding, but be aware that PDF readers use the
ToUnicode map, as long as it's present.

I should add that it isn't always possible to use Unicode for glyph encoding, because not
every glyph has a a unique unicode point. For example, a font may include a set of normal
characters and a set of small caps characters, but only one of these can map to the unicode
"A" character. The other is forced to map to some other code, which is why GIDs are typically
used with TrueType fonts, because we can guarantee that each glyph has a unique GID and the
ToUnicode map can be used to map both the normal "A" and small cap "A" to Unicode "A".

> True type PDFont subclass only supports WinAnsiEncoding (hardcoded!)
> --------------------------------------------------------------------
>
>                 Key: PDFBOX-922
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-922
>             Project: PDFBox
>          Issue Type: New Feature
>          Components: Writing
>    Affects Versions: 1.3.1
>         Environment: JDK 1.6 / OS irrelevant, tried against 1.3.1 and 1.2.0
>            Reporter: Thanos Agelatos
>            Assignee: Andreas Lehmkühler
>
> PDFBox cannot embed Identity-H or Identity-V type TTF fonts in the PDF it creates, making
it impossible to create PDFs in any language apart from English and ones supported in WinAnsiEncoding.
This behaviour is caused because method PDTrueTypeFont.loadTTF has hardcoded WinAnsiEncoding
inside, and there is no Identity-H or Identity-V Encoding classes provided (to set afterwards
via PDFont.setFont() )
> This excludes the following languages plus many others:
> - Greek
> - Bulgarian
> - Swedish
> - Baltic languages
> - Malteze 
> The PDF created contains garbled characters and/or squares.
> Simple test case:
>                 PDDocument doc = null;
> 		try {
> 			doc = new PDDocument();
> 			PDPage page = new PDPage();
> 			doc.addPage(page);
> 			// extract fonts for fields
> 			byte[] arialNorm = extractFont("arial.ttf");
> 			//byte[] arialBold = extractFont("arialbd.ttf"); 
> 			//PDFont font = PDType1Font.HELVETICA;
> 			PDFont font = PDTrueTypeFont.loadTTF(doc, new ByteArrayInputStream(arialNorm));
> 			
> 			PDPageContentStream contentStream = new PDPageContentStream(doc, page);
> 			contentStream.beginText();
> 			contentStream.setFont(font, 12);
> 			contentStream.moveTextPositionByAmount(100, 700);
> 			contentStream.drawString("Hello world from PDFBox ελληνικά"); // text here
may appear garbled; insert any text in Greek or Bulgarian or Malteze
> 			contentStream.endText();
> 			contentStream.close();
> 			doc.save("pdfbox.pdf");
> 			System.out.println(" created!");
> 		} catch (Exception ioe) {
> 			ioe.printStackTrace();
> 		} finally {
> 			if (doc != null) {
> 				try { doc.close(); } catch (Exception e) {}
> 			}
> 		}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message