pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Daley <tim.da...@cru.org>
Subject Re: PDFBOX 2 scanned documents
Date Thu, 24 Sep 2015 02:11:36 GMT
That's what I was looking for!!!

Thanks!

On Wed, Sep 23, 2015 at 4:20 PM, Tilman Hausherr <THausherr@t-online.de>
wrote:

> Am 23.09.2015 um 22:16 schrieb Tim Daley:
>
>> The main gist of the program is to read in a multi-page pdf. Based on a
>> control file, detect what type of document this represents:
>>
>> simplex/duplex
>> portrait/landscape
>> 1-up/2-up (in the case of books/booklets)
>>
>> In the case of books/booklets, the pages need to be split in half and the
>> individual pages reordered so that they are in page number order.
>>
>> The resulting pages are then rotated as necessary and ouput either as is
>> (for use on a tablet), or arranged on letter sized pages. In this latter
>> case, the pages are moved to the upper right for simplex printing or
>> alternately upper right and upper left for duplex printing.
>>
>> I assume the easiest way would be to build a single image, rotate the
>> image
>> as necessary, resize the image as necessary and write the image to a new
>> page. If so, what operations are required to assemble all the source
>> images
>> into a single image. Or is there an easier way to do this with PDFBox?
>>
>
> Ah, I think I understand: you look at the images in the resources, and
> based on the width / height ratio, you make a decision.
> However there's no guarantee that the images will come in a certain
> sequence.
>
> Why not simply render the PDF pages?
>
>
> http://stackoverflow.com/questions/23326562/apache-pdfbox-convert-pdf-to-images
>
>
> Tilman
>
>
>
>> On Wed, Sep 23, 2015 at 4:01 PM, Tim Daley <tim.daley@cru.org> wrote:
>>
>> It's on https://www.daley.ws/Believe.pdf
>>>
>>> I found PDFDebugger, thanks!
>>>
>>> Now that I look at my code again. It looks like I am reading  a list of
>>> images.
>>>
>>> On Wed, Sep 23, 2015 at 2:44 PM, Tilman Hausherr <THausherr@t-online.de>
>>> wrote:
>>>
>>> Am 23.09.2015 um 20:39 schrieb Tim Daley:
>>>>
>>>> Whoops! I don't see PDFDebugger in PDFBox 2. Oversight? I'll get it out
>>>>> of
>>>>> Version 1.
>>>>>
>>>>> You can't attach PDF files. Upload them somewhere.
>>>>
>>>> PDFDebugger is there, I even made a change earlier today! It is part of
>>>> the PDFBox app jar.
>>>>
>>>> Tilman
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Sep 23, 2015 at 2:35 PM, Tim Daley <tim.daley@cru.org> wrote:
>>>>>
>>>>> The PDF is at the bottom of the email. Aha! PDFDebugger!
>>>>>
>>>>>> On Wed, Sep 23, 2015 at 1:31 PM, Tilman Hausherr <
>>>>>> THausherr@t-online.de
>>>>>> wrote:
>>>>>>
>>>>>> The XObjects should be the same count in version 1 and 2.
>>>>>>
>>>>>>> If you don't want to share the PDFs, then look at them with the
new
>>>>>>> PDFDebugger. You can see the XObject images easily.
>>>>>>>
>>>>>>> Tilman
>>>>>>>
>>>>>>> Am 23.09.2015 um 19:21 schrieb Tim Daley:
>>>>>>>
>>>>>>> Here's the basic code that used to work. Granted, it probably
depends
>>>>>>>
>>>>>>>> heavily on Version 1's structure.
>>>>>>>>
>>>>>>>>
>>>>>>>> PDPage pdPage = CFCAPDFInputProgressBar.this.pdPages.get(i);
>>>>>>>>
>>>>>>>> Map<COSName, PDXObject> images = new TreeMap<COSName,
PDXObject>();
>>>>>>>>
>>>>>>>> PDResources pdResources = pdPage.getResources();
>>>>>>>>
>>>>>>>> for(Entry<COSName, PDXObject> objectImageEntry:images.entrySet())
>>>>>>>>
>>>>>>>> {
>>>>>>>>
>>>>>>>>     PDXObject pdXObject = objectImageEntry.getValue();
>>>>>>>>
>>>>>>>>     if (pdXObject instanceof PDImageXObject)
>>>>>>>>
>>>>>>>>     {
>>>>>>>>
>>>>>>>>       PDImageXObject pdXObjectImage= ((PDImageXObject)pdXObject);
>>>>>>>>
>>>>>>>>       BufferedImage bufferedImage = null;
>>>>>>>>
>>>>>>>>       try{bufferedImage= pdXObjectImage.getImage();}
>>>>>>>>
>>>>>>>> catch(Throwable t)
>>>>>>>>
>>>>>>>>       {
>>>>>>>>
>>>>>>>>         t.printStackTrace();
>>>>>>>>
>>>>>>>>         randomAccessFile.close();
>>>>>>>>
>>>>>>>>         throw new RuntimeException(t);
>>>>>>>>
>>>>>>>>       }
>>>>>>>>
>>>>>>>>       if (CFCAPDFInputProgressBar.this.music.getLandscape())
>>>>>>>>
>>>>>>>>         bufferedImage= rotate90DX(bufferedImage);
>>>>>>>>
>>>>>>>> int width = bufferedImage.getWidth();
>>>>>>>>
>>>>>>>> int height = bufferedImage.getHeight();
>>>>>>>>
>>>>>>>> if (CFCAPDFInputProgressBar.this.music.getTwoPage())
>>>>>>>>
>>>>>>>>       {
>>>>>>>>
>>>>>>>>         width /= 2;
>>>>>>>>
>>>>>>>>         boolean even = i%2 == 0;
>>>>>>>>
>>>>>>>>         intrightPageNo= even?i+1:pageCount*2-i;
>>>>>>>>
>>>>>>>>         intleftPageNo= even?pageCount*2-i:i+1;
>>>>>>>>
>>>>>>>>         putPage(bufferedImage, rightPageNo, width, 0, width,
>>>>>>>> height);
>>>>>>>>
>>>>>>>>         putPage(bufferedImage, leftPageNo, 0, 0, width, height);
>>>>>>>>
>>>>>>>>       }
>>>>>>>>
>>>>>>>> else
>>>>>>>>
>>>>>>>>       {
>>>>>>>>
>>>>>>>>         int pageNo = CFCAPDFInputProgressBar.this.music.getStart()
>>>>>>>> + i;
>>>>>>>>
>>>>>>>>       putPage(bufferedImage, pageNo, 0, 0, width, height);
>>>>>>>>
>>>>>>>>       }
>>>>>>>>
>>>>>>>>     }
>>>>>>>>
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Sep 23, 2015 at 1:06 PM, Tilman Hausherr <
>>>>>>>> THausherr@t-online.de
>>>>>>>> <mailto:THausherr@t-online.de>> wrote:
>>>>>>>>
>>>>>>>>       Am 23.09.2015 um 17:33 schrieb Tim Daley:
>>>>>>>>
>>>>>>>>           It appears that PDFBOX 2 handles scanned documents
>>>>>>>> differently
>>>>>>>>           than PDFBOX
>>>>>>>>           1.
>>>>>>>>
>>>>>>>>           I have multipage PDFs that I have scanned from
a
>>>>>>>>           Konica/Minolta C224e. The
>>>>>>>>           PDFs in version 1 seemed to come in as a single
image.
>>>>>>>> Now in
>>>>>>>>           version 2,
>>>>>>>>           they seem to come in as multiple images. I assume
this is
>>>>>>>> to
>>>>>>>>           reduce the
>>>>>>>>           size of the resultant PDFs.
>>>>>>>>
>>>>>>>>           Is there a way to retrieve each page as a single
image or
>>>>>>>> is
>>>>>>>>           there a method
>>>>>>>>           to merge all the images on a page into a single
image?
>>>>>>>>
>>>>>>>>
>>>>>>>>       Can't comment without having a sample PDF. And I don't
know
>>>>>>>> what
>>>>>>>>       you mean with "seemed to come in as a single image".
>>>>>>>>
>>>>>>>>       Tilman
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>       To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>>>>>>       <mailto:users-unsubscribe@pdfbox.apache.org>
>>>>>>>>       For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>>>>>       <mailto:users-help@pdfbox.apache.org>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Tim Daley*
>>>>>>>> IT Specialist-Operating Systems
>>>>>>>> cru | Engagement & Services | Platform Team
>>>>>>>> o:407-826-2911 | m:407-716-0284
>>>>>>>> tim.daley@cru.org <mailto:tim.daley@cru.org>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>> *Tim Daley*
>>>>>> IT Specialist-Operating Systems
>>>>>> cru | Engagement & Services | Platform Team
>>>>>> o: 407-826-2911 | m: 407-716-0284
>>>>>> tim.daley@cru.org
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>
>>>>
>>>>
>>> --
>>> *Tim Daley*
>>> IT Specialist-Operating Systems
>>> cru | Engagement & Services | Platform Team
>>> o: 407-826-2911 | m: 407-716-0284
>>> tim.daley@cru.org
>>>
>>>
>>>
>>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>


-- 
*Tim Daley*
IT Specialist-Operating Systems
cru | Engagement & Services | Platform Team
o: 407-826-2911 | m: 407-716-0284
tim.daley@cru.org

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message