pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: PDFBOX 2 scanned documents
Date Thu, 24 Sep 2015 06:36:17 GMT
Am 24.09.2015 um 04:22 schrieb Tim Daley:
> Another note. When I was using PDFDebugger, I could get it to load a URL,
> but I when I tried the file selection menu, all the PDF files were grayed
> out.
>
> I'm on OSX 10.10. It's not a biggie for me as I can just use a URL.

Weird... It worked for me (on windows)... Another possibility is to use 
drag and drop.

Tilman

> On Wed, Sep 23, 2015 at 10:11 PM, Tim Daley <tim.daley@cru.org> wrote:
>
>> That's what I was looking for!!!
>>
>> Thanks!
>>
>> On Wed, Sep 23, 2015 at 4:20 PM, Tilman Hausherr <THausherr@t-online.de>
>> wrote:
>>
>>> Am 23.09.2015 um 22:16 schrieb Tim Daley:
>>>
>>>> The main gist of the program is to read in a multi-page pdf. Based on a
>>>> control file, detect what type of document this represents:
>>>>
>>>> simplex/duplex
>>>> portrait/landscape
>>>> 1-up/2-up (in the case of books/booklets)
>>>>
>>>> In the case of books/booklets, the pages need to be split in half and the
>>>> individual pages reordered so that they are in page number order.
>>>>
>>>> The resulting pages are then rotated as necessary and ouput either as is
>>>> (for use on a tablet), or arranged on letter sized pages. In this latter
>>>> case, the pages are moved to the upper right for simplex printing or
>>>> alternately upper right and upper left for duplex printing.
>>>>
>>>> I assume the easiest way would be to build a single image, rotate the
>>>> image
>>>> as necessary, resize the image as necessary and write the image to a new
>>>> page. If so, what operations are required to assemble all the source
>>>> images
>>>> into a single image. Or is there an easier way to do this with PDFBox?
>>>>
>>> Ah, I think I understand: you look at the images in the resources, and
>>> based on the width / height ratio, you make a decision.
>>> However there's no guarantee that the images will come in a certain
>>> sequence.
>>>
>>> Why not simply render the PDF pages?
>>>
>>>
>>> http://stackoverflow.com/questions/23326562/apache-pdfbox-convert-pdf-to-images
>>>
>>>
>>> Tilman
>>>
>>>
>>>
>>>> On Wed, Sep 23, 2015 at 4:01 PM, Tim Daley <tim.daley@cru.org> wrote:
>>>>
>>>> It's on https://www.daley.ws/Believe.pdf
>>>>> I found PDFDebugger, thanks!
>>>>>
>>>>> Now that I look at my code again. It looks like I am reading  a list
of
>>>>> images.
>>>>>
>>>>> On Wed, Sep 23, 2015 at 2:44 PM, Tilman Hausherr <THausherr@t-online.de
>>>>> wrote:
>>>>>
>>>>> Am 23.09.2015 um 20:39 schrieb Tim Daley:
>>>>>> Whoops! I don't see PDFDebugger in PDFBox 2. Oversight? I'll get
it out
>>>>>>> of
>>>>>>> Version 1.
>>>>>>>
>>>>>>> You can't attach PDF files. Upload them somewhere.
>>>>>> PDFDebugger is there, I even made a change earlier today! It is part
of
>>>>>> the PDFBox app jar.
>>>>>>
>>>>>> Tilman
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Sep 23, 2015 at 2:35 PM, Tim Daley <tim.daley@cru.org>
wrote:
>>>>>>> The PDF is at the bottom of the email. Aha! PDFDebugger!
>>>>>>>
>>>>>>>> On Wed, Sep 23, 2015 at 1:31 PM, Tilman Hausherr <
>>>>>>>> THausherr@t-online.de
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> The XObjects should be the same count in version 1 and 2.
>>>>>>>>
>>>>>>>>> If you don't want to share the PDFs, then look at them
with the new
>>>>>>>>> PDFDebugger. You can see the XObject images easily.
>>>>>>>>>
>>>>>>>>> Tilman
>>>>>>>>>
>>>>>>>>> Am 23.09.2015 um 19:21 schrieb Tim Daley:
>>>>>>>>>
>>>>>>>>> Here's the basic code that used to work. Granted, it
probably
>>>>>>>>> depends
>>>>>>>>>
>>>>>>>>>> heavily on Version 1's structure.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> PDPage pdPage = CFCAPDFInputProgressBar.this.pdPages.get(i);
>>>>>>>>>>
>>>>>>>>>> Map<COSName, PDXObject> images = new TreeMap<COSName,
PDXObject>();
>>>>>>>>>>
>>>>>>>>>> PDResources pdResources = pdPage.getResources();
>>>>>>>>>>
>>>>>>>>>> for(Entry<COSName, PDXObject> objectImageEntry:images.entrySet())
>>>>>>>>>>
>>>>>>>>>> {
>>>>>>>>>>
>>>>>>>>>>      PDXObject pdXObject = objectImageEntry.getValue();
>>>>>>>>>>
>>>>>>>>>>      if (pdXObject instanceof PDImageXObject)
>>>>>>>>>>
>>>>>>>>>>      {
>>>>>>>>>>
>>>>>>>>>>        PDImageXObject pdXObjectImage= ((PDImageXObject)pdXObject);
>>>>>>>>>>
>>>>>>>>>>        BufferedImage bufferedImage = null;
>>>>>>>>>>
>>>>>>>>>>        try{bufferedImage= pdXObjectImage.getImage();}
>>>>>>>>>>
>>>>>>>>>> catch(Throwable t)
>>>>>>>>>>
>>>>>>>>>>        {
>>>>>>>>>>
>>>>>>>>>>          t.printStackTrace();
>>>>>>>>>>
>>>>>>>>>>          randomAccessFile.close();
>>>>>>>>>>
>>>>>>>>>>          throw new RuntimeException(t);
>>>>>>>>>>
>>>>>>>>>>        }
>>>>>>>>>>
>>>>>>>>>>        if (CFCAPDFInputProgressBar.this.music.getLandscape())
>>>>>>>>>>
>>>>>>>>>>          bufferedImage= rotate90DX(bufferedImage);
>>>>>>>>>>
>>>>>>>>>> int width = bufferedImage.getWidth();
>>>>>>>>>>
>>>>>>>>>> int height = bufferedImage.getHeight();
>>>>>>>>>>
>>>>>>>>>> if (CFCAPDFInputProgressBar.this.music.getTwoPage())
>>>>>>>>>>
>>>>>>>>>>        {
>>>>>>>>>>
>>>>>>>>>>          width /= 2;
>>>>>>>>>>
>>>>>>>>>>          boolean even = i%2 == 0;
>>>>>>>>>>
>>>>>>>>>>          intrightPageNo= even?i+1:pageCount*2-i;
>>>>>>>>>>
>>>>>>>>>>          intleftPageNo= even?pageCount*2-i:i+1;
>>>>>>>>>>
>>>>>>>>>>          putPage(bufferedImage, rightPageNo, width,
0, width,
>>>>>>>>>> height);
>>>>>>>>>>
>>>>>>>>>>          putPage(bufferedImage, leftPageNo, 0, 0,
width, height);
>>>>>>>>>>
>>>>>>>>>>        }
>>>>>>>>>>
>>>>>>>>>> else
>>>>>>>>>>
>>>>>>>>>>        {
>>>>>>>>>>
>>>>>>>>>>          int pageNo = CFCAPDFInputProgressBar.this.music.getStart()
>>>>>>>>>> + i;
>>>>>>>>>>
>>>>>>>>>>        putPage(bufferedImage, pageNo, 0, 0, width,
height);
>>>>>>>>>>
>>>>>>>>>>        }
>>>>>>>>>>
>>>>>>>>>>      }
>>>>>>>>>>
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Sep 23, 2015 at 1:06 PM, Tilman Hausherr
<
>>>>>>>>>> THausherr@t-online.de
>>>>>>>>>> <mailto:THausherr@t-online.de>> wrote:
>>>>>>>>>>
>>>>>>>>>>        Am 23.09.2015 um 17:33 schrieb Tim Daley:
>>>>>>>>>>
>>>>>>>>>>            It appears that PDFBOX 2 handles scanned
documents
>>>>>>>>>> differently
>>>>>>>>>>            than PDFBOX
>>>>>>>>>>            1.
>>>>>>>>>>
>>>>>>>>>>            I have multipage PDFs that I have scanned
from a
>>>>>>>>>>            Konica/Minolta C224e. The
>>>>>>>>>>            PDFs in version 1 seemed to come in as
a single image.
>>>>>>>>>> Now in
>>>>>>>>>>            version 2,
>>>>>>>>>>            they seem to come in as multiple images.
I assume this
>>>>>>>>>> is to
>>>>>>>>>>            reduce the
>>>>>>>>>>            size of the resultant PDFs.
>>>>>>>>>>
>>>>>>>>>>            Is there a way to retrieve each page as
a single image
>>>>>>>>>> or is
>>>>>>>>>>            there a method
>>>>>>>>>>            to merge all the images on a page into
a single image?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>        Can't comment without having a sample PDF.
And I don't know
>>>>>>>>>> what
>>>>>>>>>>        you mean with "seemed to come in as a single
image".
>>>>>>>>>>
>>>>>>>>>>        Tilman
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>        To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>>>>>>>>        <mailto:users-unsubscribe@pdfbox.apache.org>
>>>>>>>>>>        For additional commands, e-mail:
>>>>>>>>>> users-help@pdfbox.apache.org
>>>>>>>>>>        <mailto:users-help@pdfbox.apache.org>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> *Tim Daley*
>>>>>>>>>> IT Specialist-Operating Systems
>>>>>>>>>> cru | Engagement & Services | Platform Team
>>>>>>>>>> o:407-826-2911 | m:407-716-0284
>>>>>>>>>> tim.daley@cru.org <mailto:tim.daley@cru.org>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>>>>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>> *Tim Daley*
>>>>>>>> IT Specialist-Operating Systems
>>>>>>>> cru | Engagement & Services | Platform Team
>>>>>>>> o: 407-826-2911 | m: 407-716-0284
>>>>>>>> tim.daley@cru.org
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>> --
>>>>> *Tim Daley*
>>>>> IT Specialist-Operating Systems
>>>>> cru | Engagement & Services | Platform Team
>>>>> o: 407-826-2911 | m: 407-716-0284
>>>>> tim.daley@cru.org
>>>>>
>>>>>
>>>>>
>>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>
>>>
>>
>> --
>> *Tim Daley*
>> IT Specialist-Operating Systems
>> cru | Engagement & Services | Platform Team
>> o: 407-826-2911 | m: 407-716-0284
>> tim.daley@cru.org
>>
>>
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message