pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkühler <andr...@lehmi.de>
Subject Re: Problem/Question about UTF-16 characters
Date Tue, 15 Oct 2013 18:48:05 GMT

Am 13.10.13 22:33, schrieb Karcher, Glenn:
> Hi,
> I am having a problem when attempting to output a string containing Unicode characters.
 If the Unicode sequence corresponds to single byte character (e.g., a Registered Trademark
symbol, U+00AE), the character is output correctly.  However, if the character is a 2-byte
value (e.g., Trademark character(TM), U+2122), the string is generated as UTF-16BE as expected,
but the output file is drawn with the FE and FF BOM characters and the 21, 22 characters as
single byte characters.
> Is there something that I need to initialize to properly handle the UTF-16 characters
(the most likely solution)?  Is it a bug in PDFBox?  Is it a quirk in Reader X (least likely
since I have seen the TM character being displayed correctly in other documents)?
> Any help and pointers on how to deal with this problem will be greatly appreciated.
PDFbox doesn't support utf encoded text yet, see [1] for further details.


> Best regards,
> --Glenn Karcher

Andreas Lehmkühler

[1] https://issues.apache.org/jira/browse/PDFBOX-922

View raw message