pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: Help getting pt fontsize of text.
Date Thu, 13 Jul 2017 19:07:46 GMT
Hi,

The bug is somewhat minor, but it prevents from getting the individual 
glyph bounds. Even these may not always be correct.

The only non type3 font is a standard 14 font so not much information 
there. However DrawPrintTextLocations works great for that one, i.e. the 
tool gets the exact bounds (in cyan).

You can't view type3 fonts with with fontforge. Type3 fonts are PDF 
streams. That's the code snippet. These are described in the PDF 
specification.
https://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
Start with "operator summary". "13 0 2 9 11 12 d1" is a command for the 
bounds of that glyph. However the PDF specification requires it to be 
the first command, and PDFBox expects this too. The rest of the commands 
is mostly displaying an inline image (of a glyph), which is pretty 
"cheap" because fonts are best displayed as vector fonts (truetype or 
type1). Fonts as images is kindof "80ies". But I know that some 
conversion tools produce this.

Tilman

Am 13.07.2017 um 20:42 schrieb Ernest Fayngold:
> Thanks,
>
> Unfortunately for me, my code consumes pdf's. The cap height issue is on
> another page of the pdf document (
> https://drive.google.com/open?id=0B3JXBo1bPbwPNVRYaFFqNVpONU0 ). I used
> Adobe Acrobat to strip out customer info, so hopefully the issue is still
> present. The best text to demonstrate the problem is "Ending Balance". I am
> going to take a look now at DrawPrintTextLocations.java.
> Any help on where to find resources on how to actually figure this out
> would be great. I wasn't able to view the font with FontForge. I am not
> super familiar with PDFs/Postscript and have sort of been dropped into the
> fire trying to figure this out. As you mentioned, there is a bug with
> fonts, while I am just trying to figure out what the code snippet below
> even means. Also, While there is a bug, the file still opens with Acrobat
> Reader, so there must be a way to extract the information.
>
> Really Appreciate the help, Thanks
>
> q
>> 13 0 2 9 11 12 d1
>>
>> 9 0 0 3 2 9 cm
>>
>> BI
>>
>> /W 9 /H 3 /BPC 1 /D [1, 0] /IM true
>>
>> ID
>>
>> ÿ€ÿ€ÿ€EI
>>
>> Q
>>
>>
>
> On Thu, Jul 13, 2017 at 1:02 PM, Tilman Hausherr <THausherr@t-online.de>
> wrote:
>
>> I had a look... the pt size is really 0.24. But the bounding box of the
>> first font is -1 -11 42 40. So the alleged max height is about 51 x 0.24 =
>> 12,24 units. (1 unit = 1/72 inch)
>> Nothing else influences this, and the font matrix is the identity matrix.
>>
>> Have a look at DrawPrintTextLocations.java in the source code download,
>> this will create a file with the bounds shown. Not the best, but better
>> than nothing. Showing the exact bounds for type3 fonts is difficult because
>> it isn't a vector font... even worse, your font has a bug. In the charprocs
>> of your type3 fonts, the first command should be "d1" which should be the
>> exact bounding box (and even that is unreliable). But in your font it
>> isn't, the first command is "q".
>>
>> Sorry if this is somewhat incomprehensible...
>>
>> There's also some bug in PDFDebugger that it doesn't display the glyphs.
>> (But that's just for me)
>>
>> Re CapHeight, that's another story, as your file doesn't have it.
>>
>> Tilman
>>
>>
>> Am 13.07.2017 um 18:25 schrieb Ernest Fayngold:
>>
>>> Not sure how to share on sharehoster, as it appears to be a German site
>>> and
>>> I don't speak it. So I dropped a sample file onto my google drive here:
>>> https://drive.google.com/open?id=0B3JXBo1bPbwPbTRYZi0zQTctcTA I had to
>>> modify the pdf slightly to remove customer data. Let me know if you cant
>>> open it for some reason or a better place to share.
>>>
>>> Thanks
>>>
>>> On 2017-07-13 11:54 (-0400), Tilman Hausherr <T...@t-online.de> wrote:
>>>
>>>> Hi,>
>>>>
>>>> This is a bit tricky to answer because the pt size doesn't mean much... >
>>>> there are other parameters that influence what you see on the screen. >
>>>> What you need is TextPosition.getTextMatrix(). Even then, the font size >
>>>> of a type3 font has its own logic, some are 1/1000 and some are not. >
>>>> CapHeight may or may not be correct. The best would be you'd share the >
>>>> PDF (upload to sharehoster).>
>>>>
>>>> Tilman>
>>>>
>>>> Am 13.07.2017 um 17:28 schrieb Ernest Fayngold:>
>>>>
>>>>> Hi,>
>>>>> I am working on parsing pdf documents and need to get a real font size.
>>>>>
>>>> The>
>>>> document I am working with seems to define a different type 3 font for>
>>>>> different sized text on the page. All the font size is set to 0.24. I
>>>>>
>>>> am>
>>>> trying to use the TextPosition Class to get a real FontSize in Pt for
>>>> the>
>>>> font. The Font descriptor is set to null, however is available for
>>>> some>
>>>> pieces of text written in Arial. I have tried using CapHeight in those>
>>>>> cases, but ran into an issue where there is a ton of white space above
>>>>>
>>>> and>
>>>> below the text that is being captured by the cap height. I was hoping>
>>>>> someone knew a way to get the real font size of the text, as I am at
a>
>>>>> complete loss.>
>>>>> Thanks,>
>>>>> Ernest>
>>>>>
>>>> --------------------------------------------------------------------->
>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org>
>>>> For additional commands, e-mail: users-help@pdfbox.apache.org>
>>>>
>>>>
>>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message