pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: UTF16 encoded string to PDFDocEncoding
Date Tue, 11 Jul 2017 14:58:16 GMT
fixed in https://issues.apache.org/jira/browse/PDFBOX-3864

Tilman

Am 11.07.2017 um 16:06 schrieb Tilman Hausherr:
> The cause are "gaps" in the PDFDocEncoding specification that have 
> been missed in the implementation. I'll create an issue later.
>
> Tilman
>
> Am 10.07.2017 um 19:22 schrieb Andrea Vacondio:
>> Hi, we came across this case where we are basically cloning outline 
>> items
>> where the original outline title is a UTF16BE encoded text string
>> containing the value 00A0 (non break space). We later use the string to
>> assign the title in a new outline item and the A0 is recognised as a 
>> € sign.
>> Here is a simple test:
>>
>>          COSString victim = COSString
>> .parseHex("FEFF004300680061007000740065007200A0");
>>          PDOutlineItem node = new PDOutlineItem();
>>          node.setTitle(victim.getString());
>>
>> If you look at the node dictionary you'll see that the title value is
>> Chapter€
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message