pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Daley <tim.da...@cru.org>
Subject Re: PDFBOX 2 scanned documents
Date Fri, 25 Sep 2015 11:02:35 GMT
Tims-MacBook-Pro-2:Downloads tdaley$ jar -xvf
pdfbox-app-2.0.0-20150925.080424-1677-sources.jar

  created: META-INF/

 inflated: META-INF/MANIFEST.MF

 inflated: META-INF/DEPENDENCIES

 inflated: META-INF/NOTICE

 inflated: META-INF/LICENSE

Tims-MacBook-Pro-2:Downloads tdaley$



On Fri, Sep 25, 2015 at 1:13 AM, Tilman Hausherr <THausherr@t-online.de>
wrote:

> Am 25.09.2015 um 01:50 schrieb Tim Daley:
>
>> I checked the Maven source for the package and it only contained metadata.
>> I would have tried to check it out.
>>
>
> Not sure what the question is, but the latest 2.0 app snapshot is here:
>
> https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.0-SNAPSHOT/
>
> just download the latest jar file.
>
> Tilman
>
>
>
>> On Thu, Sep 24, 2015 at 2:36 AM, Tilman Hausherr <THausherr@t-online.de>
>> wrote:
>>
>> Am 24.09.2015 um 04:22 schrieb Tim Daley:
>>>
>>> Another note. When I was using PDFDebugger, I could get it to load a URL,
>>>> but I when I tried the file selection menu, all the PDF files were
>>>> grayed
>>>> out.
>>>>
>>>> I'm on OSX 10.10. It's not a biggie for me as I can just use a URL.
>>>>
>>>> Weird... It worked for me (on windows)... Another possibility is to use
>>> drag and drop.
>>>
>>>
>>> Tilman
>>>
>>> On Wed, Sep 23, 2015 at 10:11 PM, Tim Daley <tim.daley@cru.org> wrote:
>>>
>>>> That's what I was looking for!!!
>>>>
>>>>> Thanks!
>>>>>
>>>>> On Wed, Sep 23, 2015 at 4:20 PM, Tilman Hausherr <
>>>>> THausherr@t-online.de>
>>>>> wrote:
>>>>>
>>>>> Am 23.09.2015 um 22:16 schrieb Tim Daley:
>>>>>
>>>>>> The main gist of the program is to read in a multi-page pdf. Based
on
>>>>>> a
>>>>>>
>>>>>>> control file, detect what type of document this represents:
>>>>>>>
>>>>>>> simplex/duplex
>>>>>>> portrait/landscape
>>>>>>> 1-up/2-up (in the case of books/booklets)
>>>>>>>
>>>>>>> In the case of books/booklets, the pages need to be split in
half and
>>>>>>> the
>>>>>>> individual pages reordered so that they are in page number order.
>>>>>>>
>>>>>>> The resulting pages are then rotated as necessary and ouput either
as
>>>>>>> is
>>>>>>> (for use on a tablet), or arranged on letter sized pages. In
this
>>>>>>> latter
>>>>>>> case, the pages are moved to the upper right for simplex printing
or
>>>>>>> alternately upper right and upper left for duplex printing.
>>>>>>>
>>>>>>> I assume the easiest way would be to build a single image, rotate
the
>>>>>>> image
>>>>>>> as necessary, resize the image as necessary and write the image
to a
>>>>>>> new
>>>>>>> page. If so, what operations are required to assemble all the
source
>>>>>>> images
>>>>>>> into a single image. Or is there an easier way to do this with
>>>>>>> PDFBox?
>>>>>>>
>>>>>>> Ah, I think I understand: you look at the images in the resources,
>>>>>>> and
>>>>>>>
>>>>>> based on the width / height ratio, you make a decision.
>>>>>> However there's no guarantee that the images will come in a certain
>>>>>> sequence.
>>>>>>
>>>>>> Why not simply render the PDF pages?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://stackoverflow.com/questions/23326562/apache-pdfbox-convert-pdf-to-images
>>>>>>
>>>>>>
>>>>>> Tilman
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Sep 23, 2015 at 4:01 PM, Tim Daley <tim.daley@cru.org>
wrote:
>>>>>>
>>>>>>> It's on https://www.daley.ws/Believe.pdf
>>>>>>>
>>>>>>> I found PDFDebugger, thanks!
>>>>>>>>
>>>>>>>> Now that I look at my code again. It looks like I am reading
 a list
>>>>>>>> of
>>>>>>>> images.
>>>>>>>>
>>>>>>>> On Wed, Sep 23, 2015 at 2:44 PM, Tilman Hausherr <
>>>>>>>> THausherr@t-online.de
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Am 23.09.2015 um 20:39 schrieb Tim Daley:
>>>>>>>>
>>>>>>>> Whoops! I don't see PDFDebugger in PDFBox 2. Oversight? I'll
get it
>>>>>>>>> out
>>>>>>>>>
>>>>>>>>> of
>>>>>>>>>> Version 1.
>>>>>>>>>>
>>>>>>>>>> You can't attach PDF files. Upload them somewhere.
>>>>>>>>>>
>>>>>>>>>> PDFDebugger is there, I even made a change earlier
today! It is
>>>>>>>>> part
>>>>>>>>> of
>>>>>>>>> the PDFBox app jar.
>>>>>>>>>
>>>>>>>>> Tilman
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Sep 23, 2015 at 2:35 PM, Tim Daley <tim.daley@cru.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> The PDF is at the bottom of the email. Aha! PDFDebugger!
>>>>>>>>>>
>>>>>>>>>> On Wed, Sep 23, 2015 at 1:31 PM, Tilman Hausherr
<
>>>>>>>>>>
>>>>>>>>>>> THausherr@t-online.de
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> The XObjects should be the same count in version
1 and 2.
>>>>>>>>>>>
>>>>>>>>>>> If you don't want to share the PDFs, then look
at them with the
>>>>>>>>>>> new
>>>>>>>>>>>
>>>>>>>>>>>> PDFDebugger. You can see the XObject images
easily.
>>>>>>>>>>>>
>>>>>>>>>>>> Tilman
>>>>>>>>>>>>
>>>>>>>>>>>> Am 23.09.2015 um 19:21 schrieb Tim Daley:
>>>>>>>>>>>>
>>>>>>>>>>>> Here's the basic code that used to work.
Granted, it probably
>>>>>>>>>>>> depends
>>>>>>>>>>>>
>>>>>>>>>>>> heavily on Version 1's structure.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> PDPage pdPage = CFCAPDFInputProgressBar.this.pdPages.get(i);
>>>>>>>>>>>>>
>>>>>>>>>>>>> Map<COSName, PDXObject> images
= new TreeMap<COSName,
>>>>>>>>>>>>> PDXObject>();
>>>>>>>>>>>>>
>>>>>>>>>>>>> PDResources pdResources = pdPage.getResources();
>>>>>>>>>>>>>
>>>>>>>>>>>>> for(Entry<COSName, PDXObject>
>>>>>>>>>>>>> objectImageEntry:images.entrySet())
>>>>>>>>>>>>>
>>>>>>>>>>>>> {
>>>>>>>>>>>>>
>>>>>>>>>>>>>       PDXObject pdXObject = objectImageEntry.getValue();
>>>>>>>>>>>>>
>>>>>>>>>>>>>       if (pdXObject instanceof PDImageXObject)
>>>>>>>>>>>>>
>>>>>>>>>>>>>       {
>>>>>>>>>>>>>
>>>>>>>>>>>>>         PDImageXObject pdXObjectImage=
>>>>>>>>>>>>> ((PDImageXObject)pdXObject);
>>>>>>>>>>>>>
>>>>>>>>>>>>>         BufferedImage bufferedImage =
null;
>>>>>>>>>>>>>
>>>>>>>>>>>>>         try{bufferedImage= pdXObjectImage.getImage();}
>>>>>>>>>>>>>
>>>>>>>>>>>>> catch(Throwable t)
>>>>>>>>>>>>>
>>>>>>>>>>>>>         {
>>>>>>>>>>>>>
>>>>>>>>>>>>>           t.printStackTrace();
>>>>>>>>>>>>>
>>>>>>>>>>>>>           randomAccessFile.close();
>>>>>>>>>>>>>
>>>>>>>>>>>>>           throw new RuntimeException(t);
>>>>>>>>>>>>>
>>>>>>>>>>>>>         }
>>>>>>>>>>>>>
>>>>>>>>>>>>>         if (CFCAPDFInputProgressBar.this.music.getLandscape())
>>>>>>>>>>>>>
>>>>>>>>>>>>>           bufferedImage= rotate90DX(bufferedImage);
>>>>>>>>>>>>>
>>>>>>>>>>>>> int width = bufferedImage.getWidth();
>>>>>>>>>>>>>
>>>>>>>>>>>>> int height = bufferedImage.getHeight();
>>>>>>>>>>>>>
>>>>>>>>>>>>> if (CFCAPDFInputProgressBar.this.music.getTwoPage())
>>>>>>>>>>>>>
>>>>>>>>>>>>>         {
>>>>>>>>>>>>>
>>>>>>>>>>>>>           width /= 2;
>>>>>>>>>>>>>
>>>>>>>>>>>>>           boolean even = i%2 == 0;
>>>>>>>>>>>>>
>>>>>>>>>>>>>           intrightPageNo= even?i+1:pageCount*2-i;
>>>>>>>>>>>>>
>>>>>>>>>>>>>           intleftPageNo= even?pageCount*2-i:i+1;
>>>>>>>>>>>>>
>>>>>>>>>>>>>           putPage(bufferedImage, rightPageNo,
width, 0, width,
>>>>>>>>>>>>> height);
>>>>>>>>>>>>>
>>>>>>>>>>>>>           putPage(bufferedImage, leftPageNo,
0, 0, width,
>>>>>>>>>>>>> height);
>>>>>>>>>>>>>
>>>>>>>>>>>>>         }
>>>>>>>>>>>>>
>>>>>>>>>>>>> else
>>>>>>>>>>>>>
>>>>>>>>>>>>>         {
>>>>>>>>>>>>>
>>>>>>>>>>>>>           int pageNo =
>>>>>>>>>>>>> CFCAPDFInputProgressBar.this.music.getStart()
>>>>>>>>>>>>> + i;
>>>>>>>>>>>>>
>>>>>>>>>>>>>         putPage(bufferedImage, pageNo,
0, 0, width, height);
>>>>>>>>>>>>>
>>>>>>>>>>>>>         }
>>>>>>>>>>>>>
>>>>>>>>>>>>>       }
>>>>>>>>>>>>>
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Sep 23, 2015 at 1:06 PM, Tilman
Hausherr <
>>>>>>>>>>>>> THausherr@t-online.de
>>>>>>>>>>>>> <mailto:THausherr@t-online.de>>
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>         Am 23.09.2015 um 17:33 schrieb
Tim Daley:
>>>>>>>>>>>>>
>>>>>>>>>>>>>             It appears that PDFBOX 2
handles scanned documents
>>>>>>>>>>>>> differently
>>>>>>>>>>>>>             than PDFBOX
>>>>>>>>>>>>>             1.
>>>>>>>>>>>>>
>>>>>>>>>>>>>             I have multipage PDFs that
I have scanned from a
>>>>>>>>>>>>>             Konica/Minolta C224e. The
>>>>>>>>>>>>>             PDFs in version 1 seemed
to come in as a single
>>>>>>>>>>>>> image.
>>>>>>>>>>>>> Now in
>>>>>>>>>>>>>             version 2,
>>>>>>>>>>>>>             they seem to come in as multiple
images. I assume
>>>>>>>>>>>>> this
>>>>>>>>>>>>> is to
>>>>>>>>>>>>>             reduce the
>>>>>>>>>>>>>             size of the resultant PDFs.
>>>>>>>>>>>>>
>>>>>>>>>>>>>             Is there a way to retrieve
each page as a single
>>>>>>>>>>>>> image
>>>>>>>>>>>>> or is
>>>>>>>>>>>>>             there a method
>>>>>>>>>>>>>             to merge all the images on
a page into a single
>>>>>>>>>>>>> image?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>         Can't comment without having
a sample PDF. And I don't
>>>>>>>>>>>>> know
>>>>>>>>>>>>> what
>>>>>>>>>>>>>         you mean with "seemed to come
in as a single image".
>>>>>>>>>>>>>
>>>>>>>>>>>>>         Tilman
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>>         To unsubscribe, e-mail:
>>>>>>>>>>>>> users-unsubscribe@pdfbox.apache.org
>>>>>>>>>>>>>         <mailto:users-unsubscribe@pdfbox.apache.org>
>>>>>>>>>>>>>         For additional commands, e-mail:
>>>>>>>>>>>>> users-help@pdfbox.apache.org
>>>>>>>>>>>>>         <mailto:users-help@pdfbox.apache.org>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> *Tim Daley*
>>>>>>>>>>>>> IT Specialist-Operating Systems
>>>>>>>>>>>>> cru | Engagement & Services | Platform
Team
>>>>>>>>>>>>> o:407-826-2911 | m:407-716-0284
>>>>>>>>>>>>> tim.daley@cru.org <mailto:tim.daley@cru.org>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>>>>>>>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Tim Daley*
>>>>>>>>>>>>
>>>>>>>>>>> IT Specialist-Operating Systems
>>>>>>>>>>> cru | Engagement & Services | Platform Team
>>>>>>>>>>> o: 407-826-2911 | m: 407-716-0284
>>>>>>>>>>> tim.daley@cru.org
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>
>>>>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>>>>>>>>
>>>>>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>> *Tim Daley*
>>>>>>>> IT Specialist-Operating Systems
>>>>>>>> cru | Engagement & Services | Platform Team
>>>>>>>> o: 407-826-2911 | m: 407-716-0284
>>>>>>>> tim.daley@cru.org
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>
>>>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>> *Tim Daley*
>>>>> IT Specialist-Operating Systems
>>>>> cru | Engagement & Services | Platform Team
>>>>> o: 407-826-2911 | m: 407-716-0284
>>>>> tim.daley@cru.org
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>
>>>
>>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>


-- 
*Tim Daley*
IT Specialist-Operating Systems
cru | Engagement & Services | Platform Team
o: 407-826-2911 | m: 407-716-0284
tim.daley@cru.org

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message