pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ygor Mutti <ygormu...@jusbrasil.com.br>
Subject Re: TextPosition.getIndividualWidths() returns array with less items than expected
Date Tue, 19 Jul 2016 21:09:30 GMT
Yes, it helps. Thank you for the prompt answer!

I wonder why the string returned by getUnicode contains the separate chars
instead of the ligature. Is there some way I can configure PDFTextStripper
to decode it as it is in the PDF?

On Tue, Jul 19, 2016 at 4:47 PM Tilman Hausherr <THausherr@t-online.de>
wrote:

> Am 19.07.2016 um 20:43 schrieb Ygor Mutti:
> > Hi!
> >
> > The javadoc states that the TextPosition.getIndividualWidths() method
> > returns "An array that is the same length as the length of the string."
> > Here is a gist containing a test case in which this statement is false:
> > https://gist.github.com/ygormutti/d40a80d425d552151625a063fb29c9ca
>
> I'd say the javadoc is wrong. It is the length of the CharacterCodes
> array, not the length of the unicode string. The "fi" in Justificação is
> one glyph, a ligature.
>
> This is the content stream:
>
> [ (J) 20 (usti\037ca\347\343o) ] TJ
>
> Does this explanation help?
>
> Tilman
>
> >
> > It prints a line for two cases where the TextPosition.getUnicode()
> returns
> > "fi" while at the same time TextPosition,getIndividualWidths() returns an
> > array containing a single float.
> >
> > I've tried to pin down the version in which this behavior has been
> > introduced and found out it works as expected in 1.2.1 release and does
> not
> > since 1.3.0.
> >
> > Should I open a ticket for this?
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message