pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ropo@ropo.de" <r...@ropo.de>
Subject Writing non-ansi characters into form fields
Date Wed, 22 Mar 2017 11:52:50 GMT
Hello,
 
I am using pdfbox 2.0.5 to fill out form fields of a PDF document using this code:
        doc = PDDocument.load(inputStream);
        PDDocumentCatalog catalog = doc.getDocumentCatalog();
        PDAcroForm form = catalog.getAcroForm();
        for (PDField field : form.getFieldTree()){
            field.setValue("должен");
        }
        
I get this error: U+0434 ('afii10069') is not available in this font Times-Roman (generic:
TimesNewRomanPSMT) encoding: StandardEncoding with differences

I can create the PDF documents any way I like. I have tried MS Office export as Adobe PDF
and creating directly with Acrobat Pro DC. When creating the fields in Acrobat I can select
a font. I tried all kinds of fonts, for "Arial Unicode MS" it wants to download a 50MB "Adobe
Acrobat Reader DC Font Pack". The final PDF file with the filled out form fields should be
viewable/printable by anyone without first installing a font pack

The PDF document itself contains cyrillic text which is displayed just fine. Filling out the
form in Acrobat Reader works flawlessly, the only problem is in PDFBox.

According to https://issues.apache.org/jira/browse/PDFBOX-3138 The embedded font used by the
field does indeed contain Hebrew glyphs, and a valid "cmap" table which can be used to look
up those glyphs. The mentioned character, U+05D7, is indeed is present in the font. The embedded
font file is in OpenType format, however the PDF Font dictionary is Type1 and specifies WinAnsiEncoding,
which does not include Hebrew characters. So, strictly speaking, the field cannot be filled
using any non-ANSI characters and so PDFBox's behaviour is correct.

Tried another approach: Instead of setValue() I called ((PDTextField)field).setDefaultValue();
It does not throw an exception, but unfortunately in the result PDF I still see the previous
default value in the document. The new default value only appears in the properties of the
field.

Using this code I see that the font is a PDTrueTypeFont:
String  da      = field.getCOSObject().getString(COSName.DA.getName());
Matcher m       = Pattern.compile("/?(.*) [\\d]+ Tf.*", Pattern.CASE_INSENSITIVE).matcher(da);
String  name    = m.find() ? m.group(1) : null;
PDFont  font    = field.getAcroForm().getDefaultResources().getFont(COSName.getPDFName(name));

How can I create the PDF document and use PDFBox to fill out the form with non-ansi characters?

Thanks,
Roland

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message