pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bain, Michael" <Michael.B...@McKesson.com>
Subject RE: Problem with Unicode text in PDF form text field
Date Wed, 12 Jun 2013 11:19:40 GMT
You may be in luck!  I had a similar problem with missing characters that weren’t really
missing and I just figured it out last night.  I found this answer on StackOverflow that describes
in detail the challenge of text replacement in a pre-existing PDF.

http://stackoverflow.com/questions/15964704/java-pdfbox-reading-and-modifying-a-pdf-with-special-characters-diacritics

See the answer by Plinth.  Basically what I found in mine was that any character that had
not been previously used within the PDF when it was rendered disappeared.  Reading his post
made me realize that only a subset of the font was being included within the embedded font
in the file.  I ended up just adding a junk line with all of the characters to my file during
rendering to test this, and it cleared up the problem.  The color and size of the line don’t
seem to matter, it is just whether or not the rendering decides if the character is needed
within the subset or not.  Hope this helps!

Thanks...Mike

From: Steffen R. [mailto:raubvogel87@googlemail.com]
Sent: Wednesday, June 12, 2013 4:18 AM
To: users@pdfbox.apache.org
Subject: Problem with Unicode text in PDF form text field

Hello,
I am facing a problem that might be a bug. This is the scenario: Loading a PDF, filling in
some form text fields and saving it back to PDF. When I do this

PDDocument doc = null;
        try
        {
            doc = PDDocument.load( "Test.pdf" );

            PDAcroForm form = doc.getDocumentCatalog().getAcroForm();
            PDVariableText field = (PDVariableText) form.getField("testField");
            field.setValue("Test it 123456789012345 äüö?ß! á Ф ф Й й άγγελος");

            doc.save( "TestFilled.pdf" );
        } catch (COSVisitorException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
        finally
        {
            if( doc != null )
            {
                try {
                    doc.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
with the attached PDF file (created from scratch with Acrobat XI Standard), the field is filled
in the saved PDF file but the characters are not presented as in code. And now the most curious
thing: If you click into the form field then the correct text will be shown. Very strange.
Is someone facing a similar problem? Is this a known bug? Does a workaround or patch exist?
I took a look at the source code. It seems that beside the normal field value an additional
"appearence" for showing the field value is added which maybe doesn't support unicode the
way it is implemented atm.

Thanks in advance for any help,
Steffen Harbich

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message