Return-Path: X-Original-To: apmail-pdfbox-users-archive@www.apache.org Delivered-To: apmail-pdfbox-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 41F61171C8 for ; Thu, 24 Sep 2015 06:36:23 +0000 (UTC) Received: (qmail 84707 invoked by uid 500); 24 Sep 2015 06:36:23 -0000 Delivered-To: apmail-pdfbox-users-archive@pdfbox.apache.org Received: (qmail 84692 invoked by uid 500); 24 Sep 2015 06:36:23 -0000 Mailing-List: contact users-help@pdfbox.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@pdfbox.apache.org Delivered-To: mailing list users@pdfbox.apache.org Received: (qmail 84681 invoked by uid 99); 24 Sep 2015 06:36:22 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Sep 2015 06:36:22 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 56F62F7C24 for ; Thu, 24 Sep 2015 06:36:22 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.802 X-Spam-Level: * X-Spam-Status: No, score=1.802 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id zFRipkpaA87v for ; Thu, 24 Sep 2015 06:36:11 +0000 (UTC) Received: from mailout07.t-online.de (mailout07.t-online.de [194.25.134.83]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 4AF5144189 for ; Thu, 24 Sep 2015 06:36:11 +0000 (UTC) Received: from fwd01.aul.t-online.de (fwd01.aul.t-online.de [172.20.27.147]) by mailout07.t-online.de (Postfix) with SMTP id 92288487F0B for ; Thu, 24 Sep 2015 08:36:04 +0200 (CEST) Received: from [192.168.2.102] (bKY++yZ6YhCdbbrkCsTJP3tKBlWBSpLoDv4uPOozsKhBYoxQWvSwWbRXmOvQRoowpA@[217.231.155.134]) by fwd01.t-online.de with (TLSv1.2:ECDHE-RSA-AES256-SHA encrypted) esmtp id 1Zf08J-0wwEvg0; Thu, 24 Sep 2015 08:35:55 +0200 Subject: Re: PDFBOX 2 scanned documents To: users@pdfbox.apache.org References: <5602DC01.2020507@t-online.de> <5602E1DE.5060303@t-online.de> <5602F31D.4070709@t-online.de> <560309A8.5050902@t-online.de> From: Tilman Hausherr Message-ID: <560399E1.50009@t-online.de> Date: Thu, 24 Sep 2015 08:36:17 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-ID: bKY++yZ6YhCdbbrkCsTJP3tKBlWBSpLoDv4uPOozsKhBYoxQWvSwWbRXmOvQRoowpA X-TOI-MSGID: d083b448-b528-46f2-9afa-65febade8a4f Am 24.09.2015 um 04:22 schrieb Tim Daley: > Another note. When I was using PDFDebugger, I could get it to load a URL, > but I when I tried the file selection menu, all the PDF files were grayed > out. > > I'm on OSX 10.10. It's not a biggie for me as I can just use a URL. Weird... It worked for me (on windows)... Another possibility is to use drag and drop. Tilman > On Wed, Sep 23, 2015 at 10:11 PM, Tim Daley wrote: > >> That's what I was looking for!!! >> >> Thanks! >> >> On Wed, Sep 23, 2015 at 4:20 PM, Tilman Hausherr >> wrote: >> >>> Am 23.09.2015 um 22:16 schrieb Tim Daley: >>> >>>> The main gist of the program is to read in a multi-page pdf. Based on a >>>> control file, detect what type of document this represents: >>>> >>>> simplex/duplex >>>> portrait/landscape >>>> 1-up/2-up (in the case of books/booklets) >>>> >>>> In the case of books/booklets, the pages need to be split in half and the >>>> individual pages reordered so that they are in page number order. >>>> >>>> The resulting pages are then rotated as necessary and ouput either as is >>>> (for use on a tablet), or arranged on letter sized pages. In this latter >>>> case, the pages are moved to the upper right for simplex printing or >>>> alternately upper right and upper left for duplex printing. >>>> >>>> I assume the easiest way would be to build a single image, rotate the >>>> image >>>> as necessary, resize the image as necessary and write the image to a new >>>> page. If so, what operations are required to assemble all the source >>>> images >>>> into a single image. Or is there an easier way to do this with PDFBox? >>>> >>> Ah, I think I understand: you look at the images in the resources, and >>> based on the width / height ratio, you make a decision. >>> However there's no guarantee that the images will come in a certain >>> sequence. >>> >>> Why not simply render the PDF pages? >>> >>> >>> http://stackoverflow.com/questions/23326562/apache-pdfbox-convert-pdf-to-images >>> >>> >>> Tilman >>> >>> >>> >>>> On Wed, Sep 23, 2015 at 4:01 PM, Tim Daley wrote: >>>> >>>> It's on https://www.daley.ws/Believe.pdf >>>>> I found PDFDebugger, thanks! >>>>> >>>>> Now that I look at my code again. It looks like I am reading a list of >>>>> images. >>>>> >>>>> On Wed, Sep 23, 2015 at 2:44 PM, Tilman Hausherr >>>> wrote: >>>>> >>>>> Am 23.09.2015 um 20:39 schrieb Tim Daley: >>>>>> Whoops! I don't see PDFDebugger in PDFBox 2. Oversight? I'll get it out >>>>>>> of >>>>>>> Version 1. >>>>>>> >>>>>>> You can't attach PDF files. Upload them somewhere. >>>>>> PDFDebugger is there, I even made a change earlier today! It is part of >>>>>> the PDFBox app jar. >>>>>> >>>>>> Tilman >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Sep 23, 2015 at 2:35 PM, Tim Daley wrote: >>>>>>> The PDF is at the bottom of the email. Aha! PDFDebugger! >>>>>>> >>>>>>>> On Wed, Sep 23, 2015 at 1:31 PM, Tilman Hausherr < >>>>>>>> THausherr@t-online.de >>>>>>>> wrote: >>>>>>>> >>>>>>>> The XObjects should be the same count in version 1 and 2. >>>>>>>> >>>>>>>>> If you don't want to share the PDFs, then look at them with the new >>>>>>>>> PDFDebugger. You can see the XObject images easily. >>>>>>>>> >>>>>>>>> Tilman >>>>>>>>> >>>>>>>>> Am 23.09.2015 um 19:21 schrieb Tim Daley: >>>>>>>>> >>>>>>>>> Here's the basic code that used to work. Granted, it probably >>>>>>>>> depends >>>>>>>>> >>>>>>>>>> heavily on Version 1's structure. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> PDPage pdPage = CFCAPDFInputProgressBar.this.pdPages.get(i); >>>>>>>>>> >>>>>>>>>> Map images = new TreeMap(); >>>>>>>>>> >>>>>>>>>> PDResources pdResources = pdPage.getResources(); >>>>>>>>>> >>>>>>>>>> for(Entry objectImageEntry:images.entrySet()) >>>>>>>>>> >>>>>>>>>> { >>>>>>>>>> >>>>>>>>>> PDXObject pdXObject = objectImageEntry.getValue(); >>>>>>>>>> >>>>>>>>>> if (pdXObject instanceof PDImageXObject) >>>>>>>>>> >>>>>>>>>> { >>>>>>>>>> >>>>>>>>>> PDImageXObject pdXObjectImage= ((PDImageXObject)pdXObject); >>>>>>>>>> >>>>>>>>>> BufferedImage bufferedImage = null; >>>>>>>>>> >>>>>>>>>> try{bufferedImage= pdXObjectImage.getImage();} >>>>>>>>>> >>>>>>>>>> catch(Throwable t) >>>>>>>>>> >>>>>>>>>> { >>>>>>>>>> >>>>>>>>>> t.printStackTrace(); >>>>>>>>>> >>>>>>>>>> randomAccessFile.close(); >>>>>>>>>> >>>>>>>>>> throw new RuntimeException(t); >>>>>>>>>> >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> if (CFCAPDFInputProgressBar.this.music.getLandscape()) >>>>>>>>>> >>>>>>>>>> bufferedImage= rotate90DX(bufferedImage); >>>>>>>>>> >>>>>>>>>> int width = bufferedImage.getWidth(); >>>>>>>>>> >>>>>>>>>> int height = bufferedImage.getHeight(); >>>>>>>>>> >>>>>>>>>> if (CFCAPDFInputProgressBar.this.music.getTwoPage()) >>>>>>>>>> >>>>>>>>>> { >>>>>>>>>> >>>>>>>>>> width /= 2; >>>>>>>>>> >>>>>>>>>> boolean even = i%2 == 0; >>>>>>>>>> >>>>>>>>>> intrightPageNo= even?i+1:pageCount*2-i; >>>>>>>>>> >>>>>>>>>> intleftPageNo= even?pageCount*2-i:i+1; >>>>>>>>>> >>>>>>>>>> putPage(bufferedImage, rightPageNo, width, 0, width, >>>>>>>>>> height); >>>>>>>>>> >>>>>>>>>> putPage(bufferedImage, leftPageNo, 0, 0, width, height); >>>>>>>>>> >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> else >>>>>>>>>> >>>>>>>>>> { >>>>>>>>>> >>>>>>>>>> int pageNo = CFCAPDFInputProgressBar.this.music.getStart() >>>>>>>>>> + i; >>>>>>>>>> >>>>>>>>>> putPage(bufferedImage, pageNo, 0, 0, width, height); >>>>>>>>>> >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Sep 23, 2015 at 1:06 PM, Tilman Hausherr < >>>>>>>>>> THausherr@t-online.de >>>>>>>>>> > wrote: >>>>>>>>>> >>>>>>>>>> Am 23.09.2015 um 17:33 schrieb Tim Daley: >>>>>>>>>> >>>>>>>>>> It appears that PDFBOX 2 handles scanned documents >>>>>>>>>> differently >>>>>>>>>> than PDFBOX >>>>>>>>>> 1. >>>>>>>>>> >>>>>>>>>> I have multipage PDFs that I have scanned from a >>>>>>>>>> Konica/Minolta C224e. The >>>>>>>>>> PDFs in version 1 seemed to come in as a single image. >>>>>>>>>> Now in >>>>>>>>>> version 2, >>>>>>>>>> they seem to come in as multiple images. I assume this >>>>>>>>>> is to >>>>>>>>>> reduce the >>>>>>>>>> size of the resultant PDFs. >>>>>>>>>> >>>>>>>>>> Is there a way to retrieve each page as a single image >>>>>>>>>> or is >>>>>>>>>> there a method >>>>>>>>>> to merge all the images on a page into a single image? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Can't comment without having a sample PDF. And I don't know >>>>>>>>>> what >>>>>>>>>> you mean with "seemed to come in as a single image". >>>>>>>>>> >>>>>>>>>> Tilman >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org >>>>>>>>>> >>>>>>>>>> For additional commands, e-mail: >>>>>>>>>> users-help@pdfbox.apache.org >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> *Tim Daley* >>>>>>>>>> IT Specialist-Operating Systems >>>>>>>>>> cru | Engagement & Services | Platform Team >>>>>>>>>> o:407-826-2911 | m:407-716-0284 >>>>>>>>>> tim.daley@cru.org >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org >>>>>>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>> *Tim Daley* >>>>>>>> IT Specialist-Operating Systems >>>>>>>> cru | Engagement & Services | Platform Team >>>>>>>> o: 407-826-2911 | m: 407-716-0284 >>>>>>>> tim.daley@cru.org >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org >>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org >>>>>> >>>>>> >>>>>> >>>>> -- >>>>> *Tim Daley* >>>>> IT Specialist-Operating Systems >>>>> cru | Engagement & Services | Platform Team >>>>> o: 407-826-2911 | m: 407-716-0284 >>>>> tim.daley@cru.org >>>>> >>>>> >>>>> >>>>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org >>> For additional commands, e-mail: users-help@pdfbox.apache.org >>> >>> >> >> -- >> *Tim Daley* >> IT Specialist-Operating Systems >> cru | Engagement & Services | Platform Team >> o: 407-826-2911 | m: 407-716-0284 >> tim.daley@cru.org >> >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org For additional commands, e-mail: users-help@pdfbox.apache.org