pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <Pasi.Ko...@tieto.com>
Subject FW: Non-Ascii characters messed up in AcroForm (PdbBox 1.8.4)
Date Tue, 15 Apr 2014 05:15:33 GMT

I was able to fix the problem by applying the patch described in PDF-BOX-283.

Also, I was not able to notice any unwanted side effects.

I added my vote to have this patch applied into trunk.


Pasi Koski

From: Koski Pasi
Sent: 11. huhtikuuta 2014 11:09
To: 'users@pdfbox.apache.org'
Subject: Non-Ascii characters messed up in AcroForm (PdbBox 1.8.4)


I'm working on a Java server side application which produces PDF forms which are pre-filled
by the application. These documents are delivered to the end user via a browser interface
after which the end user continues to edit the forms. Usually the forms are then printed by
the end user or just saved electronically. No additional processing of the user input by the
application is needed, although this may be a future scenario.

The problem is with displaying non-ascii characters in editable fields. When the data entered
by the application in a form field contains non-ascii characters, they do not show up correctly
once the document is opened in a PDF viewer. However, when the field is selected, the content
is displayed correctly. If the data is changed, it will continue to display correctly after
selecting another field, but if left unchanged, non-ascii characters return to the messed
up state when the user moves out of the field.

I'm using PDFBox 1.8.4, but I had the same problem with the previous version (1.8.3). I have
not tried earlier versions.

Can anyone tell me if non-ascii characters are supposed to work properly in an AcroForm field?
What requirements does this pose on the PDF template? Do I need to encode the data before
setting as the value of the PDField? If so, what encoding method to use?

Below is a simplified code sample of what I'm doing, from end-to-end. I've tried various alternatives
in setting the encoding of the value of the field and I've made attempts to control the font
setting via the DA dictionary parameter, but with no success. In most cases the read-only
value turned out invisible, while selecting the field would display the data correctly.

String TEMPLATE_NAME = "Form_13349A.pdf";
InputStream is = this.getClass().getClassLoader().getResourceAsStream(TEMPLATE_NAME);
pdfTemplate = PDDocument.load(is);
PDDocumentCatalog docCatalog = pdfTemplate.getDocumentCatalog();
PDAcroForm acroForm = docCatalog.getAcroForm();
PDField field = acroForm.getField("Field1");
String valueWithNonAsciiChars = "ÄÅÖöäå";
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
byte[] pdf = byteArrayOutputStream.toByteArray();

ByteArrayOutputStream baos = new ByteArrayOutputStream(pdf.length);
baos.write(pdf, 0, pdf.length);
resourceResponse.addProperty(HttpHeaders.CONTENT_DISPOSITION, "attachment; filename=Form_13349A.pdf");
OutputStream out = resourceResponse.getPortletOutputStream();

Every hint I've found on the Internet suggest that it's a font related problem. But frankly,
it seems like PdfBox is messing up the textField properties while setting the value. I found
a couple of descriptions matching my problem, but no solution. PDFBOX-283 issue seems to be
talking about the same problem, and there is even a patch attached, but apparently the fix
has other unwanted side effects or why was it not added to the latest version? I have not
tested the patch yet, but I probably will shortly.

As a temporary fix, I was able to produce a successful result by editing the template PDF,
by setting the Custom Format Script (that's what Adobe XI calls it) of the field like so:

var txtField = event.target;
txtField.textFont = font.Helv;
txtField.textColor = color.black;

HOWEVER, this only works with Adobe Reader, not the built-in reader with Chrome or Firefox.
Plus, this is not a very nice fix since it requires the PDF template designer to remember
to copy the script into the Custom Format Script entry for each and every field in each and
every PDF template. Most importantly though, the solution should support every major PDF viewer.

Help would be very much appreciated!

Pasi Koski

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message