pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Murray-Rust <pm...@cam.ac.uk>
Subject Re: Extracting vector graphics from PDF
Date Mon, 07 May 2012 09:03:25 GMT
I have followed Andrey's recommendation and am able to analyze graphics
primitives written to MyGraphics2D object (I generally use Batik's
SVGGraphics2D as I have an SVG toolkit for analyzing the primitives). I am
having difficulty with the Font information.

On Mon, Apr 2, 2012 at 2:51 PM, Andrey Kuznetsov <imagero@gmx.de> wrote:

> Peter, you have to pass your own Graphics2D object (with some overridden
> methods) to pdfbox.
> MyGraphics2D extends Graphics2D {
>     public void setFont(Font) { ...}

I am using pdfbox-1.6.0 from Maven and using my own version of PDReader
which captures the graphics. When I capture the font as above it is always:


When using PDFReader, however, all the glyphs are correctly drawn,
including italic and a variety of fonts. I have also managed to capture
graphic information where all the characters were outlines (SVGPath) rather
than SVGText suggesting that PDFReader had written the glyphs directly.

I'd be grateful for any pointers as to how I can capture either or both of
the Font or glyph information and what is actually happening when the
information is passed. (I am quite prepared to work with the glyphs as
there are some documents where, I think, only glyph information is provided
so I have to do some analysis there.


> --
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message