pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Daley <tim.da...@cru.org>
Subject Re: PDFBOX 2 scanned documents
Date Thu, 24 Sep 2015 23:50:27 GMT
I checked the Maven source for the package and it only contained metadata.
I would have tried to check it out.

On Thu, Sep 24, 2015 at 2:36 AM, Tilman Hausherr <THausherr@t-online.de>
wrote:

> Am 24.09.2015 um 04:22 schrieb Tim Daley:
>
>> Another note. When I was using PDFDebugger, I could get it to load a URL,
>> but I when I tried the file selection menu, all the PDF files were grayed
>> out.
>>
>> I'm on OSX 10.10. It's not a biggie for me as I can just use a URL.
>>
>
> Weird... It worked for me (on windows)... Another possibility is to use
> drag and drop.
>
>
> Tilman
>
> On Wed, Sep 23, 2015 at 10:11 PM, Tim Daley <tim.daley@cru.org> wrote:
>>
>> That's what I was looking for!!!
>>>
>>> Thanks!
>>>
>>> On Wed, Sep 23, 2015 at 4:20 PM, Tilman Hausherr <THausherr@t-online.de>
>>> wrote:
>>>
>>> Am 23.09.2015 um 22:16 schrieb Tim Daley:
>>>>
>>>> The main gist of the program is to read in a multi-page pdf. Based on a
>>>>> control file, detect what type of document this represents:
>>>>>
>>>>> simplex/duplex
>>>>> portrait/landscape
>>>>> 1-up/2-up (in the case of books/booklets)
>>>>>
>>>>> In the case of books/booklets, the pages need to be split in half and
>>>>> the
>>>>> individual pages reordered so that they are in page number order.
>>>>>
>>>>> The resulting pages are then rotated as necessary and ouput either as
>>>>> is
>>>>> (for use on a tablet), or arranged on letter sized pages. In this
>>>>> latter
>>>>> case, the pages are moved to the upper right for simplex printing or
>>>>> alternately upper right and upper left for duplex printing.
>>>>>
>>>>> I assume the easiest way would be to build a single image, rotate the
>>>>> image
>>>>> as necessary, resize the image as necessary and write the image to a
>>>>> new
>>>>> page. If so, what operations are required to assemble all the source
>>>>> images
>>>>> into a single image. Or is there an easier way to do this with PDFBox?
>>>>>
>>>>> Ah, I think I understand: you look at the images in the resources, and
>>>> based on the width / height ratio, you make a decision.
>>>> However there's no guarantee that the images will come in a certain
>>>> sequence.
>>>>
>>>> Why not simply render the PDF pages?
>>>>
>>>>
>>>>
>>>> http://stackoverflow.com/questions/23326562/apache-pdfbox-convert-pdf-to-images
>>>>
>>>>
>>>> Tilman
>>>>
>>>>
>>>>
>>>> On Wed, Sep 23, 2015 at 4:01 PM, Tim Daley <tim.daley@cru.org> wrote:
>>>>>
>>>>> It's on https://www.daley.ws/Believe.pdf
>>>>>
>>>>>> I found PDFDebugger, thanks!
>>>>>>
>>>>>> Now that I look at my code again. It looks like I am reading  a list
>>>>>> of
>>>>>> images.
>>>>>>
>>>>>> On Wed, Sep 23, 2015 at 2:44 PM, Tilman Hausherr <
>>>>>> THausherr@t-online.de
>>>>>> wrote:
>>>>>>
>>>>>> Am 23.09.2015 um 20:39 schrieb Tim Daley:
>>>>>>
>>>>>>> Whoops! I don't see PDFDebugger in PDFBox 2. Oversight? I'll
get it
>>>>>>> out
>>>>>>>
>>>>>>>> of
>>>>>>>> Version 1.
>>>>>>>>
>>>>>>>> You can't attach PDF files. Upload them somewhere.
>>>>>>>>
>>>>>>> PDFDebugger is there, I even made a change earlier today! It
is part
>>>>>>> of
>>>>>>> the PDFBox app jar.
>>>>>>>
>>>>>>> Tilman
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Sep 23, 2015 at 2:35 PM, Tim Daley <tim.daley@cru.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> The PDF is at the bottom of the email. Aha! PDFDebugger!
>>>>>>>>
>>>>>>>> On Wed, Sep 23, 2015 at 1:31 PM, Tilman Hausherr <
>>>>>>>>> THausherr@t-online.de
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> The XObjects should be the same count in version 1 and
2.
>>>>>>>>>
>>>>>>>>> If you don't want to share the PDFs, then look at them
with the new
>>>>>>>>>> PDFDebugger. You can see the XObject images easily.
>>>>>>>>>>
>>>>>>>>>> Tilman
>>>>>>>>>>
>>>>>>>>>> Am 23.09.2015 um 19:21 schrieb Tim Daley:
>>>>>>>>>>
>>>>>>>>>> Here's the basic code that used to work. Granted,
it probably
>>>>>>>>>> depends
>>>>>>>>>>
>>>>>>>>>> heavily on Version 1's structure.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> PDPage pdPage = CFCAPDFInputProgressBar.this.pdPages.get(i);
>>>>>>>>>>>
>>>>>>>>>>> Map<COSName, PDXObject> images = new TreeMap<COSName,
>>>>>>>>>>> PDXObject>();
>>>>>>>>>>>
>>>>>>>>>>> PDResources pdResources = pdPage.getResources();
>>>>>>>>>>>
>>>>>>>>>>> for(Entry<COSName, PDXObject> objectImageEntry:images.entrySet())
>>>>>>>>>>>
>>>>>>>>>>> {
>>>>>>>>>>>
>>>>>>>>>>>      PDXObject pdXObject = objectImageEntry.getValue();
>>>>>>>>>>>
>>>>>>>>>>>      if (pdXObject instanceof PDImageXObject)
>>>>>>>>>>>
>>>>>>>>>>>      {
>>>>>>>>>>>
>>>>>>>>>>>        PDImageXObject pdXObjectImage=
>>>>>>>>>>> ((PDImageXObject)pdXObject);
>>>>>>>>>>>
>>>>>>>>>>>        BufferedImage bufferedImage = null;
>>>>>>>>>>>
>>>>>>>>>>>        try{bufferedImage= pdXObjectImage.getImage();}
>>>>>>>>>>>
>>>>>>>>>>> catch(Throwable t)
>>>>>>>>>>>
>>>>>>>>>>>        {
>>>>>>>>>>>
>>>>>>>>>>>          t.printStackTrace();
>>>>>>>>>>>
>>>>>>>>>>>          randomAccessFile.close();
>>>>>>>>>>>
>>>>>>>>>>>          throw new RuntimeException(t);
>>>>>>>>>>>
>>>>>>>>>>>        }
>>>>>>>>>>>
>>>>>>>>>>>        if (CFCAPDFInputProgressBar.this.music.getLandscape())
>>>>>>>>>>>
>>>>>>>>>>>          bufferedImage= rotate90DX(bufferedImage);
>>>>>>>>>>>
>>>>>>>>>>> int width = bufferedImage.getWidth();
>>>>>>>>>>>
>>>>>>>>>>> int height = bufferedImage.getHeight();
>>>>>>>>>>>
>>>>>>>>>>> if (CFCAPDFInputProgressBar.this.music.getTwoPage())
>>>>>>>>>>>
>>>>>>>>>>>        {
>>>>>>>>>>>
>>>>>>>>>>>          width /= 2;
>>>>>>>>>>>
>>>>>>>>>>>          boolean even = i%2 == 0;
>>>>>>>>>>>
>>>>>>>>>>>          intrightPageNo= even?i+1:pageCount*2-i;
>>>>>>>>>>>
>>>>>>>>>>>          intleftPageNo= even?pageCount*2-i:i+1;
>>>>>>>>>>>
>>>>>>>>>>>          putPage(bufferedImage, rightPageNo,
width, 0, width,
>>>>>>>>>>> height);
>>>>>>>>>>>
>>>>>>>>>>>          putPage(bufferedImage, leftPageNo, 0,
0, width, height);
>>>>>>>>>>>
>>>>>>>>>>>        }
>>>>>>>>>>>
>>>>>>>>>>> else
>>>>>>>>>>>
>>>>>>>>>>>        {
>>>>>>>>>>>
>>>>>>>>>>>          int pageNo =
>>>>>>>>>>> CFCAPDFInputProgressBar.this.music.getStart()
>>>>>>>>>>> + i;
>>>>>>>>>>>
>>>>>>>>>>>        putPage(bufferedImage, pageNo, 0, 0, width,
height);
>>>>>>>>>>>
>>>>>>>>>>>        }
>>>>>>>>>>>
>>>>>>>>>>>      }
>>>>>>>>>>>
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Sep 23, 2015 at 1:06 PM, Tilman Hausherr
<
>>>>>>>>>>> THausherr@t-online.de
>>>>>>>>>>> <mailto:THausherr@t-online.de>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>        Am 23.09.2015 um 17:33 schrieb Tim Daley:
>>>>>>>>>>>
>>>>>>>>>>>            It appears that PDFBOX 2 handles scanned
documents
>>>>>>>>>>> differently
>>>>>>>>>>>            than PDFBOX
>>>>>>>>>>>            1.
>>>>>>>>>>>
>>>>>>>>>>>            I have multipage PDFs that I have
scanned from a
>>>>>>>>>>>            Konica/Minolta C224e. The
>>>>>>>>>>>            PDFs in version 1 seemed to come in
as a single image.
>>>>>>>>>>> Now in
>>>>>>>>>>>            version 2,
>>>>>>>>>>>            they seem to come in as multiple images.
I assume this
>>>>>>>>>>> is to
>>>>>>>>>>>            reduce the
>>>>>>>>>>>            size of the resultant PDFs.
>>>>>>>>>>>
>>>>>>>>>>>            Is there a way to retrieve each page
as a single image
>>>>>>>>>>> or is
>>>>>>>>>>>            there a method
>>>>>>>>>>>            to merge all the images on a page
into a single image?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>        Can't comment without having a sample
PDF. And I don't
>>>>>>>>>>> know
>>>>>>>>>>> what
>>>>>>>>>>>        you mean with "seemed to come in as a
single image".
>>>>>>>>>>>
>>>>>>>>>>>        Tilman
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>        To unsubscribe, e-mail:
>>>>>>>>>>> users-unsubscribe@pdfbox.apache.org
>>>>>>>>>>>        <mailto:users-unsubscribe@pdfbox.apache.org>
>>>>>>>>>>>        For additional commands, e-mail:
>>>>>>>>>>> users-help@pdfbox.apache.org
>>>>>>>>>>>        <mailto:users-help@pdfbox.apache.org>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> *Tim Daley*
>>>>>>>>>>> IT Specialist-Operating Systems
>>>>>>>>>>> cru | Engagement & Services | Platform Team
>>>>>>>>>>> o:407-826-2911 | m:407-716-0284
>>>>>>>>>>> tim.daley@cru.org <mailto:tim.daley@cru.org>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>>>>>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>> *Tim Daley*
>>>>>>>>> IT Specialist-Operating Systems
>>>>>>>>> cru | Engagement & Services | Platform Team
>>>>>>>>> o: 407-826-2911 | m: 407-716-0284
>>>>>>>>> tim.daley@cru.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>
>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>> *Tim Daley*
>>>>>> IT Specialist-Operating Systems
>>>>>> cru | Engagement & Services | Platform Team
>>>>>> o: 407-826-2911 | m: 407-716-0284
>>>>>> tim.daley@cru.org
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>
>>>>
>>>>
>>> --
>>> *Tim Daley*
>>> IT Specialist-Operating Systems
>>> cru | Engagement & Services | Platform Team
>>> o: 407-826-2911 | m: 407-716-0284
>>> tim.daley@cru.org
>>>
>>>
>>>
>>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>


-- 
*Tim Daley*
IT Specialist-Operating Systems
cru | Engagement & Services | Platform Team
o: 407-826-2911 | m: 407-716-0284
tim.daley@cru.org

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message