pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkühler (JIRA) <j...@apache.org>
Subject [jira] [Commented] (PDFBOX-1438) Problems with Image Extraction from PDF
Date Tue, 13 Nov 2012 17:24:13 GMT

    [ https://issues.apache.org/jira/browse/PDFBOX-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496350#comment-13496350
] 

Andreas Lehmkühler commented on PDFBOX-1438:
--------------------------------------------

Your code looks good to me, although it might be easier to use the ExtractImages class. [1]

The result is as expected. The pdf contains 2 images (one on each page) and both are extracted.
The remaining part consists of many lines, curves and boxes which can't be extracted as image.
A possible workaround maybe the conversion of each page to an image using PDFToImage [2].
But the result would include the 2 small images as well.


[1] http://svn.apache.org/repos/asf/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/ExtractImages.java
[2] http://svn.apache.org/repos/asf/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/PDFToImage.java
                
> Problems with Image Extraction from PDF
> ---------------------------------------
>
>                 Key: PDFBOX-1438
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1438
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.7.1
>         Environment: Windows XP
>            Reporter: Christian Czech
>         Attachments: Korrespondenz_000.jpg, Korrespondenz_001.jpg, Korrespondenz.PDF
>
>
> PDFBox don't extract images from pdf document correctly

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message