pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: Extracting vector graphics from pdf
Date Mon, 27 Feb 2017 14:38:20 GMT
http://stackoverflow.com/a/38933039/535646

This allows to collect the lines. However it won't output an image.

Tilman

Am 27.02.2017 um 13:20 schrieb Allison, Timothy B.:
> PDFBox Colleagues,
>    Any recommendations?
>
>            Best,
>
>                   Tim
>
> -----Original Message-----
> From: Andisa Dewi [mailto:theknights91@yahoo.com]
> Sent: Monday, February 27, 2017 5:32 AM
> To: user@tika.apache.org
> Subject: Extracting vector graphics from pdf
>
> Hello guys,
>
> I'm currently extracting images from a whole lot of pdf files, however some of images
(or figures) are somehow not extracted. I'm thinking it might have to do with the fact that
those images are vector graphics (as usually the case in a lot of scientific papers). My question
is, is it possible to extract vector graphics from pdfs using Tika?
>
> I attached an example of the pdf (here for example, all images are extracted except Figure
2).
>
> The way I'm extracting the images are the same as in the example code:
>
> Parser parser = new AutoDetectParser();
> Metadata m = new Metadata();
> ParseContext c = new ParseContext();
> ContentHandler h = new BodyContentHandler(-1); PDFParserConfig pdfConfig = new PDFParserConfig();
pdfConfig.setExtractInlineImages(true);
> c.set(PDFParserConfig.class, pdfConfig); c.set(Parser.class, parser); EmbeddedDocumentExtractor
ex = new MyEmbeddedDocumentExtractor(c); c.set(EmbeddedDocumentExtractor.class, ex); parser.parse(inputstream,
h, m, c);
>
>
> Thanks!
>
> Regards,
>
> Eli
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message