pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jerry <jerry.o.f...@gmail.com>
Subject bold and italic font variants misbehaving
Date Tue, 29 Mar 2016 22:00:15 GMT
I have written an application that generates an .epub document from user 
input.

I am now trying to use PdfBox to add PDF output of the same source text. 
But I have encountered problems when trying to render bold or italic text:

- In the italic font, the characters u and i in the word "quick" are 
overlapped.

- In the word-pair "brown fox" (where "brown" is in plain font and "fox" 
is italic) there is no space between the words but there is an extra 
space between the f and o in "fox".

- In the phrase "dog and ran" (which is bold) the single space between 
"and" and "ran" is too wide, and there is no space following "ran" and 
the next word.

And yet, the same string is rendered with correct spacing when output as 
plain text (no font changes).

See the output files at:

https://www.dropbox.com/s/ox4arbrfiv5jqfu/withNoHtmlTags.pdf?dl=0
https://www.dropbox.com/s/wgj029hm4wre1x5/withItalicsAndBoldFonts.pdf?dl=0

As a newbie to both PDF and PdfBox, I started with a tutorial I found at 
http://www.coderanch.com/t/659953/Wiki/PDFBox. Once I verified that I 
had entered the tutorial correctly by running it and viewing the output, 
I began experimenting by displaying a simple test string that is long 
enough to require word wrapping. When I got that to work, I tried adding 
bold and italic HTML tags to the string (since the end goal is to create 
PDF from .epub source).

Here is my test code:

https://www.dropbox.com/s/k9d22s0xsgg8tz8/TestBed.java?dl=0

In TestBed.java, doTutorial() is the unmodified tutorial.

The method doMyCode() displays the test string by breaking it into 
individual whole words. If I mark words with <i> and <b> tags, they are 
correctly rendered with bold and italic fonts. But this limits font 
changes to whole words only, which rules out a font change in the middle 
of a string of characters. To handle that I need to output individual 
characters, not words.

The method doMyCode2() displays the test string word by word unless the 
word contains an HTML tag, then text is rendered character by character.

If the test string contains no tags, it renders correctly.

See the sample file withNoHtmlTags.pdf.

When <i> and <b> tags are encountered, fonts get changed to 
PDType1Font.TIMES_BOLD or PDType1Font.TIMES_ITALIC as required, and the 
string is rendered, but the character spacing is mangled.

See the sample file withItalicsAndBoldFonts.pdf.

Both of these files were generated by the same code---the doMyCode2() 
method---with the only change being the addition or subtraction of <i> 
and <b> tags to the string paraText.

It does not appear to be a font problem, rather a rendering problem. I 
get the same (well, nearly the same) results with both Times and 
Helvetica---the "nearly the same" being the positioning of the u and I 
characters in the word "quick"---still overlapping, but in the Helvetica 
rendering, the i is in the middle of the u while in the Times rendering, 
the i overlaps the last stroke of the u so that it looks like a u with a 
dot over its tail.

What can I do to fix this?

Thanks.

Jerry

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message