pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Daley <tim.da...@cru.org>
Subject Re: PDFBOX 2 scanned documents
Date Thu, 24 Sep 2015 23:52:17 GMT
Your responsive ness has been refreshing.

On Thu, Sep 24, 2015 at 7:50 PM, Tim Daley <tim.daley@cru.org> wrote:

> I checked the Maven source for the package and it only contained metadata.
> I would have tried to check it out.
>
> On Thu, Sep 24, 2015 at 2:36 AM, Tilman Hausherr <THausherr@t-online.de>
> wrote:
>
>> Am 24.09.2015 um 04:22 schrieb Tim Daley:
>>
>>> Another note. When I was using PDFDebugger, I could get it to load a URL,
>>> but I when I tried the file selection menu, all the PDF files were grayed
>>> out.
>>>
>>> I'm on OSX 10.10. It's not a biggie for me as I can just use a URL.
>>>
>>
>> Weird... It worked for me (on windows)... Another possibility is to use
>> drag and drop.
>>
>>
>> Tilman
>>
>> On Wed, Sep 23, 2015 at 10:11 PM, Tim Daley <tim.daley@cru.org> wrote:
>>>
>>> That's what I was looking for!!!
>>>>
>>>> Thanks!
>>>>
>>>> On Wed, Sep 23, 2015 at 4:20 PM, Tilman Hausherr <THausherr@t-online.de
>>>> >
>>>> wrote:
>>>>
>>>> Am 23.09.2015 um 22:16 schrieb Tim Daley:
>>>>>
>>>>> The main gist of the program is to read in a multi-page pdf. Based on
a
>>>>>> control file, detect what type of document this represents:
>>>>>>
>>>>>> simplex/duplex
>>>>>> portrait/landscape
>>>>>> 1-up/2-up (in the case of books/booklets)
>>>>>>
>>>>>> In the case of books/booklets, the pages need to be split in half
and
>>>>>> the
>>>>>> individual pages reordered so that they are in page number order.
>>>>>>
>>>>>> The resulting pages are then rotated as necessary and ouput either
as
>>>>>> is
>>>>>> (for use on a tablet), or arranged on letter sized pages. In this
>>>>>> latter
>>>>>> case, the pages are moved to the upper right for simplex printing
or
>>>>>> alternately upper right and upper left for duplex printing.
>>>>>>
>>>>>> I assume the easiest way would be to build a single image, rotate
the
>>>>>> image
>>>>>> as necessary, resize the image as necessary and write the image to
a
>>>>>> new
>>>>>> page. If so, what operations are required to assemble all the source
>>>>>> images
>>>>>> into a single image. Or is there an easier way to do this with PDFBox?
>>>>>>
>>>>>> Ah, I think I understand: you look at the images in the resources,
and
>>>>> based on the width / height ratio, you make a decision.
>>>>> However there's no guarantee that the images will come in a certain
>>>>> sequence.
>>>>>
>>>>> Why not simply render the PDF pages?
>>>>>
>>>>>
>>>>>
>>>>> http://stackoverflow.com/questions/23326562/apache-pdfbox-convert-pdf-to-images
>>>>>
>>>>>
>>>>> Tilman
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Sep 23, 2015 at 4:01 PM, Tim Daley <tim.daley@cru.org>
wrote:
>>>>>>
>>>>>> It's on https://www.daley.ws/Believe.pdf
>>>>>>
>>>>>>> I found PDFDebugger, thanks!
>>>>>>>
>>>>>>> Now that I look at my code again. It looks like I am reading
 a list
>>>>>>> of
>>>>>>> images.
>>>>>>>
>>>>>>> On Wed, Sep 23, 2015 at 2:44 PM, Tilman Hausherr <
>>>>>>> THausherr@t-online.de
>>>>>>> wrote:
>>>>>>>
>>>>>>> Am 23.09.2015 um 20:39 schrieb Tim Daley:
>>>>>>>
>>>>>>>> Whoops! I don't see PDFDebugger in PDFBox 2. Oversight? I'll
get it
>>>>>>>> out
>>>>>>>>
>>>>>>>>> of
>>>>>>>>> Version 1.
>>>>>>>>>
>>>>>>>>> You can't attach PDF files. Upload them somewhere.
>>>>>>>>>
>>>>>>>> PDFDebugger is there, I even made a change earlier today!
It is
>>>>>>>> part of
>>>>>>>> the PDFBox app jar.
>>>>>>>>
>>>>>>>> Tilman
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Sep 23, 2015 at 2:35 PM, Tim Daley <tim.daley@cru.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> The PDF is at the bottom of the email. Aha! PDFDebugger!
>>>>>>>>>
>>>>>>>>> On Wed, Sep 23, 2015 at 1:31 PM, Tilman Hausherr <
>>>>>>>>>> THausherr@t-online.de
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> The XObjects should be the same count in version
1 and 2.
>>>>>>>>>>
>>>>>>>>>> If you don't want to share the PDFs, then look at
them with the
>>>>>>>>>>> new
>>>>>>>>>>> PDFDebugger. You can see the XObject images easily.
>>>>>>>>>>>
>>>>>>>>>>> Tilman
>>>>>>>>>>>
>>>>>>>>>>> Am 23.09.2015 um 19:21 schrieb Tim Daley:
>>>>>>>>>>>
>>>>>>>>>>> Here's the basic code that used to work. Granted,
it probably
>>>>>>>>>>> depends
>>>>>>>>>>>
>>>>>>>>>>> heavily on Version 1's structure.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> PDPage pdPage = CFCAPDFInputProgressBar.this.pdPages.get(i);
>>>>>>>>>>>>
>>>>>>>>>>>> Map<COSName, PDXObject> images = new
TreeMap<COSName,
>>>>>>>>>>>> PDXObject>();
>>>>>>>>>>>>
>>>>>>>>>>>> PDResources pdResources = pdPage.getResources();
>>>>>>>>>>>>
>>>>>>>>>>>> for(Entry<COSName, PDXObject>
>>>>>>>>>>>> objectImageEntry:images.entrySet())
>>>>>>>>>>>>
>>>>>>>>>>>> {
>>>>>>>>>>>>
>>>>>>>>>>>>      PDXObject pdXObject = objectImageEntry.getValue();
>>>>>>>>>>>>
>>>>>>>>>>>>      if (pdXObject instanceof PDImageXObject)
>>>>>>>>>>>>
>>>>>>>>>>>>      {
>>>>>>>>>>>>
>>>>>>>>>>>>        PDImageXObject pdXObjectImage=
>>>>>>>>>>>> ((PDImageXObject)pdXObject);
>>>>>>>>>>>>
>>>>>>>>>>>>        BufferedImage bufferedImage = null;
>>>>>>>>>>>>
>>>>>>>>>>>>        try{bufferedImage= pdXObjectImage.getImage();}
>>>>>>>>>>>>
>>>>>>>>>>>> catch(Throwable t)
>>>>>>>>>>>>
>>>>>>>>>>>>        {
>>>>>>>>>>>>
>>>>>>>>>>>>          t.printStackTrace();
>>>>>>>>>>>>
>>>>>>>>>>>>          randomAccessFile.close();
>>>>>>>>>>>>
>>>>>>>>>>>>          throw new RuntimeException(t);
>>>>>>>>>>>>
>>>>>>>>>>>>        }
>>>>>>>>>>>>
>>>>>>>>>>>>        if (CFCAPDFInputProgressBar.this.music.getLandscape())
>>>>>>>>>>>>
>>>>>>>>>>>>          bufferedImage= rotate90DX(bufferedImage);
>>>>>>>>>>>>
>>>>>>>>>>>> int width = bufferedImage.getWidth();
>>>>>>>>>>>>
>>>>>>>>>>>> int height = bufferedImage.getHeight();
>>>>>>>>>>>>
>>>>>>>>>>>> if (CFCAPDFInputProgressBar.this.music.getTwoPage())
>>>>>>>>>>>>
>>>>>>>>>>>>        {
>>>>>>>>>>>>
>>>>>>>>>>>>          width /= 2;
>>>>>>>>>>>>
>>>>>>>>>>>>          boolean even = i%2 == 0;
>>>>>>>>>>>>
>>>>>>>>>>>>          intrightPageNo= even?i+1:pageCount*2-i;
>>>>>>>>>>>>
>>>>>>>>>>>>          intleftPageNo= even?pageCount*2-i:i+1;
>>>>>>>>>>>>
>>>>>>>>>>>>          putPage(bufferedImage, rightPageNo,
width, 0, width,
>>>>>>>>>>>> height);
>>>>>>>>>>>>
>>>>>>>>>>>>          putPage(bufferedImage, leftPageNo,
0, 0, width,
>>>>>>>>>>>> height);
>>>>>>>>>>>>
>>>>>>>>>>>>        }
>>>>>>>>>>>>
>>>>>>>>>>>> else
>>>>>>>>>>>>
>>>>>>>>>>>>        {
>>>>>>>>>>>>
>>>>>>>>>>>>          int pageNo =
>>>>>>>>>>>> CFCAPDFInputProgressBar.this.music.getStart()
>>>>>>>>>>>> + i;
>>>>>>>>>>>>
>>>>>>>>>>>>        putPage(bufferedImage, pageNo, 0,
0, width, height);
>>>>>>>>>>>>
>>>>>>>>>>>>        }
>>>>>>>>>>>>
>>>>>>>>>>>>      }
>>>>>>>>>>>>
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Sep 23, 2015 at 1:06 PM, Tilman Hausherr
<
>>>>>>>>>>>> THausherr@t-online.de
>>>>>>>>>>>> <mailto:THausherr@t-online.de>>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>        Am 23.09.2015 um 17:33 schrieb Tim
Daley:
>>>>>>>>>>>>
>>>>>>>>>>>>            It appears that PDFBOX 2 handles
scanned documents
>>>>>>>>>>>> differently
>>>>>>>>>>>>            than PDFBOX
>>>>>>>>>>>>            1.
>>>>>>>>>>>>
>>>>>>>>>>>>            I have multipage PDFs that I have
scanned from a
>>>>>>>>>>>>            Konica/Minolta C224e. The
>>>>>>>>>>>>            PDFs in version 1 seemed to come
in as a single
>>>>>>>>>>>> image.
>>>>>>>>>>>> Now in
>>>>>>>>>>>>            version 2,
>>>>>>>>>>>>            they seem to come in as multiple
images. I assume
>>>>>>>>>>>> this
>>>>>>>>>>>> is to
>>>>>>>>>>>>            reduce the
>>>>>>>>>>>>            size of the resultant PDFs.
>>>>>>>>>>>>
>>>>>>>>>>>>            Is there a way to retrieve each
page as a single
>>>>>>>>>>>> image
>>>>>>>>>>>> or is
>>>>>>>>>>>>            there a method
>>>>>>>>>>>>            to merge all the images on a page
into a single
>>>>>>>>>>>> image?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>        Can't comment without having a sample
PDF. And I don't
>>>>>>>>>>>> know
>>>>>>>>>>>> what
>>>>>>>>>>>>        you mean with "seemed to come in as
a single image".
>>>>>>>>>>>>
>>>>>>>>>>>>        Tilman
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>        To unsubscribe, e-mail:
>>>>>>>>>>>> users-unsubscribe@pdfbox.apache.org
>>>>>>>>>>>>        <mailto:users-unsubscribe@pdfbox.apache.org>
>>>>>>>>>>>>        For additional commands, e-mail:
>>>>>>>>>>>> users-help@pdfbox.apache.org
>>>>>>>>>>>>        <mailto:users-help@pdfbox.apache.org>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> *Tim Daley*
>>>>>>>>>>>> IT Specialist-Operating Systems
>>>>>>>>>>>> cru | Engagement & Services | Platform
Team
>>>>>>>>>>>> o:407-826-2911 | m:407-716-0284
>>>>>>>>>>>> tim.daley@cru.org <mailto:tim.daley@cru.org>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>>>>>>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>>
>>>>>>>>>>> *Tim Daley*
>>>>>>>>>> IT Specialist-Operating Systems
>>>>>>>>>> cru | Engagement & Services | Platform Team
>>>>>>>>>> o: 407-826-2911 | m: 407-716-0284
>>>>>>>>>> tim.daley@cru.org
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>
>>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>> *Tim Daley*
>>>>>>> IT Specialist-Operating Systems
>>>>>>> cru | Engagement & Services | Platform Team
>>>>>>> o: 407-826-2911 | m: 407-716-0284
>>>>>>> tim.daley@cru.org
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>>
>>>>>
>>>>>
>>>> --
>>>> *Tim Daley*
>>>> IT Specialist-Operating Systems
>>>> cru | Engagement & Services | Platform Team
>>>> o: 407-826-2911 | m: 407-716-0284
>>>> tim.daley@cru.org
>>>>
>>>>
>>>>
>>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>>
>
>
> --
> *Tim Daley*
> IT Specialist-Operating Systems
> cru | Engagement & Services | Platform Team
> o: 407-826-2911 | m: 407-716-0284
> tim.daley@cru.org
>
>
>


-- 
*Tim Daley*
IT Specialist-Operating Systems
cru | Engagement & Services | Platform Team
o: 407-826-2911 | m: 407-716-0284
tim.daley@cru.org

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message