pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Daley <tim.da...@cru.org>
Subject Re: PDFBOX 2 scanned documents
Date Thu, 24 Sep 2015 02:22:16 GMT
Another note. When I was using PDFDebugger, I could get it to load a URL,
but I when I tried the file selection menu, all the PDF files were grayed
out.

I'm on OSX 10.10. It's not a biggie for me as I can just use a URL.

On Wed, Sep 23, 2015 at 10:11 PM, Tim Daley <tim.daley@cru.org> wrote:

> That's what I was looking for!!!
>
> Thanks!
>
> On Wed, Sep 23, 2015 at 4:20 PM, Tilman Hausherr <THausherr@t-online.de>
> wrote:
>
>> Am 23.09.2015 um 22:16 schrieb Tim Daley:
>>
>>> The main gist of the program is to read in a multi-page pdf. Based on a
>>> control file, detect what type of document this represents:
>>>
>>> simplex/duplex
>>> portrait/landscape
>>> 1-up/2-up (in the case of books/booklets)
>>>
>>> In the case of books/booklets, the pages need to be split in half and the
>>> individual pages reordered so that they are in page number order.
>>>
>>> The resulting pages are then rotated as necessary and ouput either as is
>>> (for use on a tablet), or arranged on letter sized pages. In this latter
>>> case, the pages are moved to the upper right for simplex printing or
>>> alternately upper right and upper left for duplex printing.
>>>
>>> I assume the easiest way would be to build a single image, rotate the
>>> image
>>> as necessary, resize the image as necessary and write the image to a new
>>> page. If so, what operations are required to assemble all the source
>>> images
>>> into a single image. Or is there an easier way to do this with PDFBox?
>>>
>>
>> Ah, I think I understand: you look at the images in the resources, and
>> based on the width / height ratio, you make a decision.
>> However there's no guarantee that the images will come in a certain
>> sequence.
>>
>> Why not simply render the PDF pages?
>>
>>
>> http://stackoverflow.com/questions/23326562/apache-pdfbox-convert-pdf-to-images
>>
>>
>> Tilman
>>
>>
>>
>>> On Wed, Sep 23, 2015 at 4:01 PM, Tim Daley <tim.daley@cru.org> wrote:
>>>
>>> It's on https://www.daley.ws/Believe.pdf
>>>>
>>>> I found PDFDebugger, thanks!
>>>>
>>>> Now that I look at my code again. It looks like I am reading  a list of
>>>> images.
>>>>
>>>> On Wed, Sep 23, 2015 at 2:44 PM, Tilman Hausherr <THausherr@t-online.de
>>>> >
>>>> wrote:
>>>>
>>>> Am 23.09.2015 um 20:39 schrieb Tim Daley:
>>>>>
>>>>> Whoops! I don't see PDFDebugger in PDFBox 2. Oversight? I'll get it out
>>>>>> of
>>>>>> Version 1.
>>>>>>
>>>>>> You can't attach PDF files. Upload them somewhere.
>>>>>
>>>>> PDFDebugger is there, I even made a change earlier today! It is part
of
>>>>> the PDFBox app jar.
>>>>>
>>>>> Tilman
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Sep 23, 2015 at 2:35 PM, Tim Daley <tim.daley@cru.org>
wrote:
>>>>>>
>>>>>> The PDF is at the bottom of the email. Aha! PDFDebugger!
>>>>>>
>>>>>>> On Wed, Sep 23, 2015 at 1:31 PM, Tilman Hausherr <
>>>>>>> THausherr@t-online.de
>>>>>>> wrote:
>>>>>>>
>>>>>>> The XObjects should be the same count in version 1 and 2.
>>>>>>>
>>>>>>>> If you don't want to share the PDFs, then look at them with
the new
>>>>>>>> PDFDebugger. You can see the XObject images easily.
>>>>>>>>
>>>>>>>> Tilman
>>>>>>>>
>>>>>>>> Am 23.09.2015 um 19:21 schrieb Tim Daley:
>>>>>>>>
>>>>>>>> Here's the basic code that used to work. Granted, it probably
>>>>>>>> depends
>>>>>>>>
>>>>>>>>> heavily on Version 1's structure.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> PDPage pdPage = CFCAPDFInputProgressBar.this.pdPages.get(i);
>>>>>>>>>
>>>>>>>>> Map<COSName, PDXObject> images = new TreeMap<COSName,
PDXObject>();
>>>>>>>>>
>>>>>>>>> PDResources pdResources = pdPage.getResources();
>>>>>>>>>
>>>>>>>>> for(Entry<COSName, PDXObject> objectImageEntry:images.entrySet())
>>>>>>>>>
>>>>>>>>> {
>>>>>>>>>
>>>>>>>>>     PDXObject pdXObject = objectImageEntry.getValue();
>>>>>>>>>
>>>>>>>>>     if (pdXObject instanceof PDImageXObject)
>>>>>>>>>
>>>>>>>>>     {
>>>>>>>>>
>>>>>>>>>       PDImageXObject pdXObjectImage= ((PDImageXObject)pdXObject);
>>>>>>>>>
>>>>>>>>>       BufferedImage bufferedImage = null;
>>>>>>>>>
>>>>>>>>>       try{bufferedImage= pdXObjectImage.getImage();}
>>>>>>>>>
>>>>>>>>> catch(Throwable t)
>>>>>>>>>
>>>>>>>>>       {
>>>>>>>>>
>>>>>>>>>         t.printStackTrace();
>>>>>>>>>
>>>>>>>>>         randomAccessFile.close();
>>>>>>>>>
>>>>>>>>>         throw new RuntimeException(t);
>>>>>>>>>
>>>>>>>>>       }
>>>>>>>>>
>>>>>>>>>       if (CFCAPDFInputProgressBar.this.music.getLandscape())
>>>>>>>>>
>>>>>>>>>         bufferedImage= rotate90DX(bufferedImage);
>>>>>>>>>
>>>>>>>>> int width = bufferedImage.getWidth();
>>>>>>>>>
>>>>>>>>> int height = bufferedImage.getHeight();
>>>>>>>>>
>>>>>>>>> if (CFCAPDFInputProgressBar.this.music.getTwoPage())
>>>>>>>>>
>>>>>>>>>       {
>>>>>>>>>
>>>>>>>>>         width /= 2;
>>>>>>>>>
>>>>>>>>>         boolean even = i%2 == 0;
>>>>>>>>>
>>>>>>>>>         intrightPageNo= even?i+1:pageCount*2-i;
>>>>>>>>>
>>>>>>>>>         intleftPageNo= even?pageCount*2-i:i+1;
>>>>>>>>>
>>>>>>>>>         putPage(bufferedImage, rightPageNo, width, 0,
width,
>>>>>>>>> height);
>>>>>>>>>
>>>>>>>>>         putPage(bufferedImage, leftPageNo, 0, 0, width,
height);
>>>>>>>>>
>>>>>>>>>       }
>>>>>>>>>
>>>>>>>>> else
>>>>>>>>>
>>>>>>>>>       {
>>>>>>>>>
>>>>>>>>>         int pageNo = CFCAPDFInputProgressBar.this.music.getStart()
>>>>>>>>> + i;
>>>>>>>>>
>>>>>>>>>       putPage(bufferedImage, pageNo, 0, 0, width, height);
>>>>>>>>>
>>>>>>>>>       }
>>>>>>>>>
>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Sep 23, 2015 at 1:06 PM, Tilman Hausherr <
>>>>>>>>> THausherr@t-online.de
>>>>>>>>> <mailto:THausherr@t-online.de>> wrote:
>>>>>>>>>
>>>>>>>>>       Am 23.09.2015 um 17:33 schrieb Tim Daley:
>>>>>>>>>
>>>>>>>>>           It appears that PDFBOX 2 handles scanned documents
>>>>>>>>> differently
>>>>>>>>>           than PDFBOX
>>>>>>>>>           1.
>>>>>>>>>
>>>>>>>>>           I have multipage PDFs that I have scanned from
a
>>>>>>>>>           Konica/Minolta C224e. The
>>>>>>>>>           PDFs in version 1 seemed to come in as a single
image.
>>>>>>>>> Now in
>>>>>>>>>           version 2,
>>>>>>>>>           they seem to come in as multiple images. I
assume this
>>>>>>>>> is to
>>>>>>>>>           reduce the
>>>>>>>>>           size of the resultant PDFs.
>>>>>>>>>
>>>>>>>>>           Is there a way to retrieve each page as a single
image
>>>>>>>>> or is
>>>>>>>>>           there a method
>>>>>>>>>           to merge all the images on a page into a single
image?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>       Can't comment without having a sample PDF. And
I don't know
>>>>>>>>> what
>>>>>>>>>       you mean with "seemed to come in as a single image".
>>>>>>>>>
>>>>>>>>>       Tilman
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>       To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>>>>>>>       <mailto:users-unsubscribe@pdfbox.apache.org>
>>>>>>>>>       For additional commands, e-mail:
>>>>>>>>> users-help@pdfbox.apache.org
>>>>>>>>>       <mailto:users-help@pdfbox.apache.org>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> *Tim Daley*
>>>>>>>>> IT Specialist-Operating Systems
>>>>>>>>> cru | Engagement & Services | Platform Team
>>>>>>>>> o:407-826-2911 | m:407-716-0284
>>>>>>>>> tim.daley@cru.org <mailto:tim.daley@cru.org>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>>>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>> *Tim Daley*
>>>>>>> IT Specialist-Operating Systems
>>>>>>> cru | Engagement & Services | Platform Team
>>>>>>> o: 407-826-2911 | m: 407-716-0284
>>>>>>> tim.daley@cru.org
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>>
>>>>>
>>>>>
>>>> --
>>>> *Tim Daley*
>>>> IT Specialist-Operating Systems
>>>> cru | Engagement & Services | Platform Team
>>>> o: 407-826-2911 | m: 407-716-0284
>>>> tim.daley@cru.org
>>>>
>>>>
>>>>
>>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>>
>
>
> --
> *Tim Daley*
> IT Specialist-Operating Systems
> cru | Engagement & Services | Platform Team
> o: 407-826-2911 | m: 407-716-0284
> tim.daley@cru.org
>
>
>


-- 
*Tim Daley*
IT Specialist-Operating Systems
cru | Engagement & Services | Platform Team
o: 407-826-2911 | m: 407-716-0284
tim.daley@cru.org

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message