pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Daley <tim.da...@cru.org>
Subject Re: PDFBOX 2 scanned documents
Date Wed, 23 Sep 2015 18:39:20 GMT
Whoops! I don't see PDFDebugger in PDFBox 2. Oversight? I'll get it out of
Version 1.

On Wed, Sep 23, 2015 at 2:35 PM, Tim Daley <tim.daley@cru.org> wrote:

> The PDF is at the bottom of the email. Aha! PDFDebugger!
>
> On Wed, Sep 23, 2015 at 1:31 PM, Tilman Hausherr <THausherr@t-online.de>
> wrote:
>
>> The XObjects should be the same count in version 1 and 2.
>>
>> If you don't want to share the PDFs, then look at them with the new
>> PDFDebugger. You can see the XObject images easily.
>>
>> Tilman
>>
>> Am 23.09.2015 um 19:21 schrieb Tim Daley:
>>
>>> Here's the basic code that used to work. Granted, it probably depends
>>> heavily on Version 1's structure.
>>>
>>>
>>> PDPage pdPage = CFCAPDFInputProgressBar.this.pdPages.get(i);
>>>
>>> Map<COSName, PDXObject> images = new TreeMap<COSName, PDXObject>();
>>>
>>> PDResources pdResources = pdPage.getResources();
>>>
>>> for(Entry<COSName, PDXObject> objectImageEntry:images.entrySet())
>>>
>>> {
>>>
>>>   PDXObject pdXObject = objectImageEntry.getValue();
>>>
>>>   if (pdXObject instanceof PDImageXObject)
>>>
>>>   {
>>>
>>>     PDImageXObject pdXObjectImage= ((PDImageXObject)pdXObject);
>>>
>>>     BufferedImage bufferedImage = null;
>>>
>>>     try{bufferedImage= pdXObjectImage.getImage();}
>>>
>>> catch(Throwable t)
>>>
>>>     {
>>>
>>>       t.printStackTrace();
>>>
>>>       randomAccessFile.close();
>>>
>>>       throw new RuntimeException(t);
>>>
>>>     }
>>>
>>>     if (CFCAPDFInputProgressBar.this.music.getLandscape())
>>>
>>>       bufferedImage= rotate90DX(bufferedImage);
>>>
>>> int width = bufferedImage.getWidth();
>>>
>>> int height = bufferedImage.getHeight();
>>>
>>> if (CFCAPDFInputProgressBar.this.music.getTwoPage())
>>>
>>>     {
>>>
>>>       width /= 2;
>>>
>>>       boolean even = i%2 == 0;
>>>
>>>       intrightPageNo= even?i+1:pageCount*2-i;
>>>
>>>       intleftPageNo= even?pageCount*2-i:i+1;
>>>
>>>       putPage(bufferedImage, rightPageNo, width, 0, width, height);
>>>
>>>       putPage(bufferedImage, leftPageNo, 0, 0, width, height);
>>>
>>>     }
>>>
>>> else
>>>
>>>     {
>>>
>>>       int pageNo = CFCAPDFInputProgressBar.this.music.getStart() + i;
>>>
>>>     putPage(bufferedImage, pageNo, 0, 0, width, height);
>>>
>>>     }
>>>
>>>   }
>>>
>>> }
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Sep 23, 2015 at 1:06 PM, Tilman Hausherr <THausherr@t-online.de
>>> <mailto:THausherr@t-online.de>> wrote:
>>>
>>>     Am 23.09.2015 um 17:33 schrieb Tim Daley:
>>>
>>>         It appears that PDFBOX 2 handles scanned documents differently
>>>         than PDFBOX
>>>         1.
>>>
>>>         I have multipage PDFs that I have scanned from a
>>>         Konica/Minolta C224e. The
>>>         PDFs in version 1 seemed to come in as a single image. Now in
>>>         version 2,
>>>         they seem to come in as multiple images. I assume this is to
>>>         reduce the
>>>         size of the resultant PDFs.
>>>
>>>         Is there a way to retrieve each page as a single image or is
>>>         there a method
>>>         to merge all the images on a page into a single image?
>>>
>>>
>>>     Can't comment without having a sample PDF. And I don't know what
>>>     you mean with "seemed to come in as a single image".
>>>
>>>     Tilman
>>>
>>>     ---------------------------------------------------------------------
>>>     To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>     <mailto:users-unsubscribe@pdfbox.apache.org>
>>>     For additional commands, e-mail: users-help@pdfbox.apache.org
>>>     <mailto:users-help@pdfbox.apache.org>
>>>
>>>
>>>
>>>
>>> --
>>> *Tim Daley*
>>> IT Specialist-Operating Systems
>>> cru | Engagement & Services | Platform Team
>>> o:407-826-2911 | m:407-716-0284
>>> tim.daley@cru.org <mailto:tim.daley@cru.org>
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>
>>
>>
>
>
> --
> *Tim Daley*
> IT Specialist-Operating Systems
> cru | Engagement & Services | Platform Team
> o: 407-826-2911 | m: 407-716-0284
> tim.daley@cru.org
>
>
>


-- 
*Tim Daley*
IT Specialist-Operating Systems
cru | Engagement & Services | Platform Team
o: 407-826-2911 | m: 407-716-0284
tim.daley@cru.org

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message