pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrey Kuznetsov" <imag...@gmx.de>
Subject AW: Extracting vector graphics from PDF
Date Mon, 07 May 2012 09:10:03 GMT
Hi Peter,


did you tried to trace from where setFont() get called?


Best Regards






Von: peter.murray.rust@googlemail.com
[mailto:peter.murray.rust@googlemail.com] Im Auftrag von Peter Murray-Rust
Gesendet: Montag, 7. Mai 2012 11:03
An: Andrey Kuznetsov
Cc: users@pdfbox.apache.org
Betreff: Re: Extracting vector graphics from PDF


I have followed Andrey's recommendation and am able to analyze graphics
primitives written to MyGraphics2D object (I generally use Batik's
SVGGraphics2D as I have an SVG toolkit for analyzing the primitives). I am
having difficulty with the Font information.

On Mon, Apr 2, 2012 at 2:51 PM, Andrey Kuznetsov <imagero@gmx.de> wrote:

Peter, you have to pass your own Graphics2D object (with some overridden
methods) to pdfbox.

MyGraphics2D extends Graphics2D {

    public void setFont(Font) { ...}

I am using pdfbox-1.6.0 from Maven and using my own version of PDReader
which captures the graphics. When I capture the font as above it is always:


When using PDFReader, however, all the glyphs are correctly drawn, including
italic and a variety of fonts. I have also managed to capture graphic
information where all the characters were outlines (SVGPath) rather than
SVGText suggesting that PDFReader had written the glyphs directly. 

I'd be grateful for any pointers as to how I can capture either or both of
the Font or glyph information and what is actually happening when the
information is passed. (I am quite prepared to work with the glyphs as there
are some documents where, I think, only glyph information is provided so I
have to do some analysis there.



Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message