pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Murray-Rust <pm...@cam.ac.uk>
Subject Re: Extracting vector graphics from PDF
Date Mon, 07 May 2012 13:23:35 GMT
On Mon, May 7, 2012 at 1:31 PM, Andrey Kuznetsov <imagero@gmx.de> wrote:

> Peter,****
>
> ** **
>
> The COS output is horrible formatted,  so I read only first line ;-)
>

Sorry - that is what COSDictionary.toString() gave.


> ****
>
> It uses FontFile3 stream.****
>
> FontFile3 stream contains font either in Compact Font Format ( CFF) or
> OpenType Format (OTF) ****
>
> which are not supported by java.****
>
> The font name is “FKAJPF+AdvOT3b30f6db.B” which means that it is subset
> font of font named “AdvOT3b30f6db.B”.
>

I am ignorant about fonts so please correct any errors.


> ****
>
> I don’t know exactly how PdfBox handles CFF/OpenType fonts, probably they
> just search for surrogate font (by name) or some kind default font (since I
> never saw such horrible font name in system fonts).
>

I have no idea where the font came from. It's probably created by the
publisher or bought from a supplier.


> I don’t know if this is really useful for you.
>

It's very useful! First it explains why I had problems and gives me
confidence in the process.


> ****
>
> I also have no idea why font name/style are not set.****
>
> It may be nevertheless valid font.****
>
> ** **
>
> BTW The only way to make java understand CFF/OTF fonts is to convert them
> to Type1 fonts.****
>
> I doubt that there are any free java program which could do it.****
>
> **
>

Thanks for the information.

>  **
>
> / (I managed to write parser for CFF fonts, but still have to dig into
> Type1 font format, however my to do list is really long and Type1 format in
> not on first place ;-))
>

What does the parser do?


> ****
>
> ** **
>
> Best Regards****
>
> **
>

I shall probably create a hack of some kind. I can find a san-serif and
serif which are "fairly close" and substitute them.  How would I get a
system COSDictionary I could substitute?

I am mainly interested in:
* the identity of the characters
* the font metrics of  the characters.

In this way I can guess the words and the spaces between them.

 **
>
> Andrey****
>
> ** **
>
>

-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message