pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkühler <andr...@lehmi.de>
Subject Re: UTF16 encoded string to PDFDocEncoding
Date Tue, 11 Jul 2017 10:20:27 GMT

> Andreas Lehmkühler <andreas@lehmi.de> hat am 11. Juli 2017 um 12:17 geschrieben:
> 
> 
> 
> > Andrea Vacondio <andrea.vacondio@gmail.com> hat am 10. Juli 2017 um 19:22
geschrieben:
> > 
> > 
> > Hi, we came across this case where we are basically cloning outline items
> > where the original outline title is a UTF16BE encoded text string
> > containing the value 00A0 (non break space). We later use the string to
> > assign the title in a new outline item and the A0 is recognised as a € sign.
> > Here is a simple test:
> > 
> >         COSString victim = COSString
> >                 .parseHex("FEFF004300680061007000740065007200A0");
> >         PDOutlineItem node = new PDOutlineItem();
> >         node.setTitle(victim.getString());
> > 
> > If you look at the node dictionary you'll see that the title value is
> > Chapter€
> How do you look at the dictionary?
> 
> The following code:
> COSString victim = COSString.parseHex( "FEFF004300680061007000740065007200A0" );
> 			System.out.println( victim.toHexString() );
> 			System.out.println( victim.getString() );
Ups, something is missing ....

The output looks good to me:
FEFF004300680061007000740065007200A0
Chapter 
Note the second line ends with a space


Andreas

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message