pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: Writing non-ansi characters into form fields
Date Thu, 23 Mar 2017 17:44:56 GMT
Is there a font that contains only cyril glyphs? If you embed it, does 
it work with PDFBox?

Tilman

Am 22.03.2017 um 12:52 schrieb ropo@ropo.de:
> Hello,
>   
> I am using pdfbox 2.0.5 to fill out form fields of a PDF document using this code:
>          doc = PDDocument.load(inputStream);
>          PDDocumentCatalog catalog = doc.getDocumentCatalog();
>          PDAcroForm form = catalog.getAcroForm();
>          for (PDField field : form.getFieldTree()){
>              field.setValue("должен");
>          }
>          
> I get this error: U+0434 ('afii10069') is not available in this font Times-Roman (generic:
TimesNewRomanPSMT) encoding: StandardEncoding with differences
>
> I can create the PDF documents any way I like. I have tried MS Office export as Adobe
PDF and creating directly with Acrobat Pro DC. When creating the fields in Acrobat I can select
a font. I tried all kinds of fonts, for "Arial Unicode MS" it wants to download a 50MB "Adobe
Acrobat Reader DC Font Pack". The final PDF file with the filled out form fields should be
viewable/printable by anyone without first installing a font pack
>
> The PDF document itself contains cyrillic text which is displayed just fine. Filling
out the form in Acrobat Reader works flawlessly, the only problem is in PDFBox.
>
> According to https://issues.apache.org/jira/browse/PDFBOX-3138 The embedded font used
by the field does indeed contain Hebrew glyphs, and a valid "cmap" table which can be used
to look up those glyphs. The mentioned character, U+05D7, is indeed is present in the font.
The embedded font file is in OpenType format, however the PDF Font dictionary is Type1 and
specifies WinAnsiEncoding, which does not include Hebrew characters. So, strictly speaking,
the field cannot be filled using any non-ANSI characters and so PDFBox's behaviour is correct.
>
> Tried another approach: Instead of setValue() I called ((PDTextField)field).setDefaultValue();
It does not throw an exception, but unfortunately in the result PDF I still see the previous
default value in the document. The new default value only appears in the properties of the
field.
>
> Using this code I see that the font is a PDTrueTypeFont:
> String  da      = field.getCOSObject().getString(COSName.DA.getName());
> Matcher m       = Pattern.compile("/?(.*) [\\d]+ Tf.*", Pattern.CASE_INSENSITIVE).matcher(da);
> String  name    = m.find() ? m.group(1) : null;
> PDFont  font    = field.getAcroForm().getDefaultResources().getFont(COSName.getPDFName(name));
>
> How can I create the PDF document and use PDFBox to fill out the form with non-ansi characters?
>
> Thanks,
> Roland
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message