pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Daley <tim.da...@cru.org>
Subject Re: PDFBOX 2 scanned documents
Date Wed, 23 Sep 2015 20:16:08 GMT
The main gist of the program is to read in a multi-page pdf. Based on a
control file, detect what type of document this represents:

simplex/duplex
portrait/landscape
1-up/2-up (in the case of books/booklets)

In the case of books/booklets, the pages need to be split in half and the
individual pages reordered so that they are in page number order.

The resulting pages are then rotated as necessary and ouput either as is
(for use on a tablet), or arranged on letter sized pages. In this latter
case, the pages are moved to the upper right for simplex printing or
alternately upper right and upper left for duplex printing.

I assume the easiest way would be to build a single image, rotate the image
as necessary, resize the image as necessary and write the image to a new
page. If so, what operations are required to assemble all the source images
into a single image. Or is there an easier way to do this with PDFBox?

On Wed, Sep 23, 2015 at 4:01 PM, Tim Daley <tim.daley@cru.org> wrote:

> It's on https://www.daley.ws/Believe.pdf
>
> I found PDFDebugger, thanks!
>
> Now that I look at my code again. It looks like I am reading  a list of
> images.
>
> On Wed, Sep 23, 2015 at 2:44 PM, Tilman Hausherr <THausherr@t-online.de>
> wrote:
>
>> Am 23.09.2015 um 20:39 schrieb Tim Daley:
>>
>>> Whoops! I don't see PDFDebugger in PDFBox 2. Oversight? I'll get it out
>>> of
>>> Version 1.
>>>
>>
>> You can't attach PDF files. Upload them somewhere.
>>
>> PDFDebugger is there, I even made a change earlier today! It is part of
>> the PDFBox app jar.
>>
>> Tilman
>>
>>
>>
>>
>>
>>> On Wed, Sep 23, 2015 at 2:35 PM, Tim Daley <tim.daley@cru.org> wrote:
>>>
>>> The PDF is at the bottom of the email. Aha! PDFDebugger!
>>>>
>>>> On Wed, Sep 23, 2015 at 1:31 PM, Tilman Hausherr <THausherr@t-online.de
>>>> >
>>>> wrote:
>>>>
>>>> The XObjects should be the same count in version 1 and 2.
>>>>>
>>>>> If you don't want to share the PDFs, then look at them with the new
>>>>> PDFDebugger. You can see the XObject images easily.
>>>>>
>>>>> Tilman
>>>>>
>>>>> Am 23.09.2015 um 19:21 schrieb Tim Daley:
>>>>>
>>>>> Here's the basic code that used to work. Granted, it probably depends
>>>>>> heavily on Version 1's structure.
>>>>>>
>>>>>>
>>>>>> PDPage pdPage = CFCAPDFInputProgressBar.this.pdPages.get(i);
>>>>>>
>>>>>> Map<COSName, PDXObject> images = new TreeMap<COSName, PDXObject>();
>>>>>>
>>>>>> PDResources pdResources = pdPage.getResources();
>>>>>>
>>>>>> for(Entry<COSName, PDXObject> objectImageEntry:images.entrySet())
>>>>>>
>>>>>> {
>>>>>>
>>>>>>    PDXObject pdXObject = objectImageEntry.getValue();
>>>>>>
>>>>>>    if (pdXObject instanceof PDImageXObject)
>>>>>>
>>>>>>    {
>>>>>>
>>>>>>      PDImageXObject pdXObjectImage= ((PDImageXObject)pdXObject);
>>>>>>
>>>>>>      BufferedImage bufferedImage = null;
>>>>>>
>>>>>>      try{bufferedImage= pdXObjectImage.getImage();}
>>>>>>
>>>>>> catch(Throwable t)
>>>>>>
>>>>>>      {
>>>>>>
>>>>>>        t.printStackTrace();
>>>>>>
>>>>>>        randomAccessFile.close();
>>>>>>
>>>>>>        throw new RuntimeException(t);
>>>>>>
>>>>>>      }
>>>>>>
>>>>>>      if (CFCAPDFInputProgressBar.this.music.getLandscape())
>>>>>>
>>>>>>        bufferedImage= rotate90DX(bufferedImage);
>>>>>>
>>>>>> int width = bufferedImage.getWidth();
>>>>>>
>>>>>> int height = bufferedImage.getHeight();
>>>>>>
>>>>>> if (CFCAPDFInputProgressBar.this.music.getTwoPage())
>>>>>>
>>>>>>      {
>>>>>>
>>>>>>        width /= 2;
>>>>>>
>>>>>>        boolean even = i%2 == 0;
>>>>>>
>>>>>>        intrightPageNo= even?i+1:pageCount*2-i;
>>>>>>
>>>>>>        intleftPageNo= even?pageCount*2-i:i+1;
>>>>>>
>>>>>>        putPage(bufferedImage, rightPageNo, width, 0, width, height);
>>>>>>
>>>>>>        putPage(bufferedImage, leftPageNo, 0, 0, width, height);
>>>>>>
>>>>>>      }
>>>>>>
>>>>>> else
>>>>>>
>>>>>>      {
>>>>>>
>>>>>>        int pageNo = CFCAPDFInputProgressBar.this.music.getStart()
+ i;
>>>>>>
>>>>>>      putPage(bufferedImage, pageNo, 0, 0, width, height);
>>>>>>
>>>>>>      }
>>>>>>
>>>>>>    }
>>>>>>
>>>>>> }
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Sep 23, 2015 at 1:06 PM, Tilman Hausherr <
>>>>>> THausherr@t-online.de
>>>>>> <mailto:THausherr@t-online.de>> wrote:
>>>>>>
>>>>>>      Am 23.09.2015 um 17:33 schrieb Tim Daley:
>>>>>>
>>>>>>          It appears that PDFBOX 2 handles scanned documents
>>>>>> differently
>>>>>>          than PDFBOX
>>>>>>          1.
>>>>>>
>>>>>>          I have multipage PDFs that I have scanned from a
>>>>>>          Konica/Minolta C224e. The
>>>>>>          PDFs in version 1 seemed to come in as a single image. Now
in
>>>>>>          version 2,
>>>>>>          they seem to come in as multiple images. I assume this is
to
>>>>>>          reduce the
>>>>>>          size of the resultant PDFs.
>>>>>>
>>>>>>          Is there a way to retrieve each page as a single image or
is
>>>>>>          there a method
>>>>>>          to merge all the images on a page into a single image?
>>>>>>
>>>>>>
>>>>>>      Can't comment without having a sample PDF. And I don't know
what
>>>>>>      you mean with "seemed to come in as a single image".
>>>>>>
>>>>>>      Tilman
>>>>>>
>>>>>>
>>>>>>  ---------------------------------------------------------------------
>>>>>>      To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>>>>      <mailto:users-unsubscribe@pdfbox.apache.org>
>>>>>>      For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>>>      <mailto:users-help@pdfbox.apache.org>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Tim Daley*
>>>>>> IT Specialist-Operating Systems
>>>>>> cru | Engagement & Services | Platform Team
>>>>>> o:407-826-2911 | m:407-716-0284
>>>>>> tim.daley@cru.org <mailto:tim.daley@cru.org>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>>>
>>>>>>
>>>>>
>>>> --
>>>> *Tim Daley*
>>>> IT Specialist-Operating Systems
>>>> cru | Engagement & Services | Platform Team
>>>> o: 407-826-2911 | m: 407-716-0284
>>>> tim.daley@cru.org
>>>>
>>>>
>>>>
>>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>>
>
>
> --
> *Tim Daley*
> IT Specialist-Operating Systems
> cru | Engagement & Services | Platform Team
> o: 407-826-2911 | m: 407-716-0284
> tim.daley@cru.org
>
>
>


-- 
*Tim Daley*
IT Specialist-Operating Systems
cru | Engagement & Services | Platform Team
o: 407-826-2911 | m: 407-716-0284
tim.daley@cru.org

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message