pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Hewson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PDFBOX-3138) PDTextField doesn't accept any Hebrew characters as new value
Date Tue, 01 Dec 2015 17:34:10 GMT

    [ https://issues.apache.org/jira/browse/PDFBOX-3138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034142#comment-15034142
] 

John Hewson commented on PDFBOX-3138:
-------------------------------------

The embedded font used by the field does indeed contain Hebrew glyphs, and a valid "cmap"
table which can be used to look up those glyphs. The mentioned character, U+05D7, is indeed
is present in the font. 

The embedded font file is in OpenType format, however the PDF Font dictionary is Type1 and
specifies WinAnsiEncoding, which does not include Hebrew characters. So, strictly speaking,
the field cannot be filled using any non-ANSI characters and so PDFBox's behaviour is correct.

It would seem that PDFBox could so something more helpful in this instance. Filling the form
with Acrobat results in the font from the form's DR being overridden in the Field itself with
a new CIDFontType0 which has been created from the DR font. Ideally we would do that.

Do you have any control over the software producing these fields? I might be able to offer
a workaround.

> PDTextField doesn't accept any Hebrew characters as new value
> -------------------------------------------------------------
>
>                 Key: PDFBOX-3138
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3138
>             Project: PDFBox
>          Issue Type: Bug
>          Components: AcroForm, FontBox
>    Affects Versions: 2.0.0
>         Environment: Eclipse 4.2.2, Windows 7 Pro, JRE 1.8.0_05
>            Reporter: Gilad Denneboom
>            Priority: Minor
>             Fix For: 2.1.0
>
>         Attachments: SetHebrewFieldValueTest.java, Test.pdf, Test.txt
>
>
> Trying to set a UTF-8 encoded Hebrew string as the value of a PDTextField fails with
the following exception:
> {code}
> Exception in thread "main" java.lang.IllegalArgumentException: No glyph for U+05D7 in
font AdobeHebrew-Regular
> 	at org.apache.pdfbox.pdmodel.font.PDType1CFont.encode(PDType1CFont.java:300)
> 	at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:283)
> 	at org.apache.pdfbox.pdmodel.PDPageContentStream.showText(PDPageContentStream.java:341)
> 	at org.apache.pdfbox.pdmodel.interactive.form.PlainTextFormatter.format(PlainTextFormatter.java:213)
> 	at org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.insertGeneratedAppearance(AppearanceGeneratorHelper.java:373)
> 	at org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.setAppearanceContent(AppearanceGeneratorHelper.java:237)
> 	at org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.setAppearanceValue(AppearanceGeneratorHelper.java:144)
> 	at org.apache.pdfbox.pdmodel.interactive.form.PDTextField.constructAppearances(PDTextField.java:263)
> 	at org.apache.pdfbox.pdmodel.interactive.form.PDTerminalField.applyChange(PDTerminalField.java:221)
> 	at org.apache.pdfbox.pdmodel.interactive.form.PDTextField.setValue(PDTextField.java:218)
> 	at SetHebrewFieldValueTest.main(SetHebrewFieldValueTest.java:22)
> {code}
> I've tried using multiple fonts for the field, all of which can handle Hebrew characters
just fine, and got the same results in all of them.
> See attached files for a demonstration of the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Mime
View raw message