pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Daley <tim.da...@cru.org>
Subject Re: PDFBOX 2 scanned documents
Date Wed, 23 Sep 2015 17:21:40 GMT
Here's the basic code that used to work. Granted, it probably depends
heavily on Version 1's structure.


PDPage pdPage = CFCAPDFInputProgressBar.this.pdPages.get(i);

Map<COSName, PDXObject> images = new TreeMap<COSName, PDXObject>();

PDResources pdResources = pdPage.getResources();

for(Entry<COSName, PDXObject> objectImageEntry:images.entrySet())

{

  PDXObject pdXObject = objectImageEntry.getValue();

  if (pdXObject instanceof PDImageXObject)

  {

    PDImageXObject pdXObjectImage= ((PDImageXObject)pdXObject);

    BufferedImage bufferedImage = null;

    try {bufferedImage = pdXObjectImage.getImage();}

    catch(Throwable t)

    {

      t.printStackTrace();

      randomAccessFile.close();

      throw new RuntimeException(t);

    }

    if (CFCAPDFInputProgressBar.this.music.getLandscape())

      bufferedImage = rotate90DX(bufferedImage);

    int width = bufferedImage.getWidth();

    int height = bufferedImage.getHeight();

    if (CFCAPDFInputProgressBar.this.music.getTwoPage())

    {

      width /= 2;

      boolean even = i%2 == 0;

      int rightPageNo = even?i+1:pageCount*2-i;

      int leftPageNo = even?pageCount*2-i:i+1;

      putPage(bufferedImage, rightPageNo, width, 0, width, height);

      putPage(bufferedImage, leftPageNo, 0, 0, width, height);

    }

    else

    {

      int pageNo = CFCAPDFInputProgressBar.this.music.getStart() + i;

      putPage(bufferedImage, pageNo, 0, 0, width, height);

    }

  }

}





On Wed, Sep 23, 2015 at 1:06 PM, Tilman Hausherr <THausherr@t-online.de>
wrote:

> Am 23.09.2015 um 17:33 schrieb Tim Daley:
>
>> It appears that PDFBOX 2 handles scanned documents differently than PDFBOX
>> 1.
>>
>> I have multipage PDFs that I have scanned from a Konica/Minolta C224e. The
>> PDFs in version 1 seemed to come in as a single image. Now in version 2,
>> they seem to come in as multiple images. I assume this is to reduce the
>> size of the resultant PDFs.
>>
>> Is there a way to retrieve each page as a single image or is there a
>> method
>> to merge all the images on a page into a single image?
>>
>>
> Can't comment without having a sample PDF. And I don't know what you mean
> with "seemed to come in as a single image".
>
> Tilman
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>


-- 
*Tim Daley*
IT Specialist-Operating Systems
cru | Engagement & Services | Platform Team
o: 407-826-2911 | m: 407-716-0284
tim.daley@cru.org

Mime
View raw message