pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Agermose // Conviator ApS ...@conviator.com>
Subject RE: encoding
Date Sun, 09 Feb 2014 21:07:04 GMT
ehmm, ok... so the last post gave me an idea. I wanted to open the PDF in reader, fill ind
æøåÆØÅ in one of the fields, save the PDF and read it in using PDFBOX and write out
the value as a byte array/values to see what its stored in it. I gave me some unexpected results
- it gave me exacly the same values as if I hardcoded a string in java with æøåÆØÅ and
converted it to a bytearray and printed out the values. Then I tried to insert new values
in that one exact field - values having ÆØÅ in the value.

this works. In that one field. AND the font is different. 

Im thinking that the real problem is in the initial creating of the PDF. Its in openoffice
and open office is then used to export to PDF and then the PDF is used in my code. 

Im guessing that we should look at how the PDF is created in the first place. My coworker
is not danish. Maybe his openoffice is setting some font that just does not make sense to
danish and so if he used my openoffice or ... fixed his, then I would not have a problem.

so thats where we will look. Or I can simply open the PDF in my reader - fill in æ in all
fields, save it and write over everything in java :D 

if it was a bigger job maybe it would make sense to really understand whats going on ... 

-----Original Message-----
From: Andreas Lehmkuehler [mailto:andreas@lehmi.de] 
Sent: 9. februar 2014 19:44
To: users@pdfbox.apache.org
Subject: Re: encoding


Am 08.02.2014 17:31, schrieb Jan Agermose // Conviator ApS:
> hi
> Im trying to use this code to fill a document. It works - except for 
> encoding because of Danish chars: æøå
>              PDDocument pdfDocument = PDDocument.load(path);
>              PDType1Font font = PDType1Font.HELVETICA;
>              //contentStream.setFont(font, 12);
>              PDDocumentCatalog docCatalog = pdfDocument.getDocumentCatalog();
>              PDAcroForm acroForm = docCatalog.getAcroForm();
>              List<PDField> fields = acroForm.getFields();
>              for (PDField field : fields) {
>                  if (field.getFullyQualifiedName().equals("Text1")) {
>                      field.setValue(p.getFornavn() + " " + p.getEfternavn());
>              }
>              File f = File.createTempFile("ansoegningsyddanmark",".pdf");
>              pdfDocument.save(f);
> im also trying to change this :
>                      field.setValue(p.getFornavn() + " " + 
> p.getEfternavn()); into one of:
>                      field.setValue(p.getFornavn() + " " + p.getEfternavn()+ "\0153u");
>                      field.setValue(new 
> String(p.getBy().getBytes("UTF-16"), "ISO8859_1") in order to try to fix it but its not
> any ideas how to fix this?
Other encodings as WinANSI aren't yet supported, see PDFBOX-922 [1] for further details.

Andreas Lehmkühler

[1] https://issues.apache.org/jira/browse/PDFBOX-922

View raw message