pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manuel Aristarán <man...@jazzido.com>
Subject Re: Extracting vector graphics from pdf
Date Tue, 28 Feb 2017 01:08:12 GMT

> On Feb 27, 2017, at 7:20 AM, Allison, Timothy B. <tallison@mitre.org> wrote:
> I'm currently extracting images from a whole lot of pdf files, however some of images
(or figures) are somehow not extracted. I'm thinking it might have to do with the fact that
those images are vector graphics (as usually the case in a lot of scientific papers). My question
is, is it possible to extract vector graphics from pdfs using Tika?

We do some of that in Tabula: https://github.com/tabulapdf/tabula-java/blob/master/src/main/java/technology/tabula/ObjectExtractor.java#L181-L270

Manuel Aristarán <manuel@jazzido.com>

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message