pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Patterson <patterd20...@gmail.com>
Subject Re: Looking for a way to iterate over images in a PDF
Date Fri, 07 Apr 2017 20:59:44 GMT
Tilman,

The ExtractImages sample code is a 1.8 artifact (I believe). It has a lot
of errors when compiled with 2.0.5 libraries.

1) two imports are no longer in the 2.0.5 library
import org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectForm;
import org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectImage;

2) missing methods or methods with different signatures:
PDDocument.loadNonSeq(                                            ** method
not define
PDDocument.load(                                                       **
load now requires a File, not a String
document.openProtection (
document.getDocumentCatalog().getAllPages()              ** getAllPages is
missing from the PDDocumentCatalog
resources.getXObjects()                                               **
where resources is a PDResources object
if (xobject instanceof PDXObjectImage)                         **
PDXObjectImage is not defined
else if (xobject instanceof PDXObjectForm)                   ** same with
PDXObjectForm

Maybe a new ExtractImages2 program needs to be developed for the PDFBox 2
era.

Dave Patterson




On Thu, Apr 6, 2017 at 5:02 PM, Tilman Hausherr <THausherr@t-online.de>
wrote:

> Am 06.04.2017 um 21:22 schrieb David Patterson:
>
>> I've got some PDF's to try to read. Many of them have images in them. I'd
>> like to be able to iterate over the images and determine their encoding
>> (png vs. jpeg vs. ?) and size.
>>
>> I've found a sample that lets me iterate over the PDXObject entities, but
>> I'm missing a key piece to determine the size and format of the objects.
>>
>> a) Is a PDXObject always an image, or could it be something else?
>>
>
> Yes it could be a form. That's why all examples (e.g. ExtractImages.java)
> always check the type, and the cast to the image xobject type. That one
> will give the size and the filters.
>
> Tilman
>
>
>> Here is the code I've got so far.
>>
>> for ( PDPage aPage : pdfDocument.getPages() ) {
>> PDResources pdResources = aPage.getResources();
>> for ( COSName cosObject : pdResources.getXObjectNames() ) {
>> PDXObject xObj = pdResources.getXObject( cosObject);
>> System.out.println( "got an image maybe" );
>>
>> This is where I've gotten stumped. I've looked at lots of lists of
>> COS-whatever things, but it has not led me to "the answer."
>>
>> Thanks for any guidance you can provide.
>>
>> Dave Patterson
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message