pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkühler (JIRA) <j...@apache.org>
Subject [jira] [Commented] (PDFBOX-1438) Problems with Image Extraction from PDF
Date Tue, 13 Nov 2012 17:24:13 GMT

    [ https://issues.apache.org/jira/browse/PDFBOX-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496350#comment-13496350

Andreas Lehmkühler commented on PDFBOX-1438:

Your code looks good to me, although it might be easier to use the ExtractImages class. [1]

The result is as expected. The pdf contains 2 images (one on each page) and both are extracted.
The remaining part consists of many lines, curves and boxes which can't be extracted as image.
A possible workaround maybe the conversion of each page to an image using PDFToImage [2].
But the result would include the 2 small images as well.

[1] http://svn.apache.org/repos/asf/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/ExtractImages.java
[2] http://svn.apache.org/repos/asf/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/PDFToImage.java
> Problems with Image Extraction from PDF
> ---------------------------------------
>                 Key: PDFBOX-1438
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1438
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.7.1
>         Environment: Windows XP
>            Reporter: Christian Czech
>         Attachments: Korrespondenz_000.jpg, Korrespondenz_001.jpg, Korrespondenz.PDF
> PDFBox don't extract images from pdf document correctly

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message