Return-Path: X-Original-To: apmail-pdfbox-users-archive@www.apache.org Delivered-To: apmail-pdfbox-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5D8781897E for ; Thu, 24 Sep 2015 02:22:37 +0000 (UTC) Received: (qmail 32915 invoked by uid 500); 24 Sep 2015 02:22:30 -0000 Delivered-To: apmail-pdfbox-users-archive@pdfbox.apache.org Received: (qmail 32892 invoked by uid 500); 24 Sep 2015 02:22:30 -0000 Mailing-List: contact users-help@pdfbox.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@pdfbox.apache.org Delivered-To: mailing list users@pdfbox.apache.org Received: (qmail 32880 invoked by uid 99); 24 Sep 2015 02:22:30 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Sep 2015 02:22:30 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 322EA188A40 for ; Thu, 24 Sep 2015 02:22:30 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.901 X-Spam-Level: ** X-Spam-Status: No, score=2.901 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=cru.org Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id fr6Ud5OHkIxy for ; Thu, 24 Sep 2015 02:22:18 +0000 (UTC) Received: from mail-wi0-f173.google.com (mail-wi0-f173.google.com [209.85.212.173]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 84B5643A11 for ; Thu, 24 Sep 2015 02:22:17 +0000 (UTC) Received: by wicge5 with SMTP id ge5so232388644wic.0 for ; Wed, 23 Sep 2015 19:22:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cru.org; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=6VToGurhj1nw6hPtwRUvXIZu8NWSEWrRfgb/id/0YEQ=; b=F3cwfDa7DSCdXxDzAuffcfRVx/3dZ2J0FzSSwS2ievz9pvhX9IAxkrZPrwlMI+n5bF Cv1oFs+SVTp+tKs1H/qy+F7X8l6brmlmdQEjDGLzxn4gSo2DDfbXdkUQ/oj/JHiO+mez DoMZYTX85xj5kgbaGpE94/ggv6+Pe6DMed5qo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=6VToGurhj1nw6hPtwRUvXIZu8NWSEWrRfgb/id/0YEQ=; b=HO5yV3vVwUhYDpATeqyp4JngBDwYlWLhrEtlAYs7r0yrQshmQ/866xfDQ0jdlHANID ocD4cN1Yf3Yb3JFYepMMyIIEZoI81yAMudpl999I8Tte0V0LBzPFM+v68r3kQ/URpMYC SsgRrskAZw41P36f8YUiAkaRq01w/v5sqPKGmY0dfsMknZKNEeHfrS+cJPa95IowMcXQ v0+uqE5j8zVa8DG4uyjEsqKfQX2CwrO54nd799VEEDblSCB2wfbwYxKmNh7bbW+sjr6Q kDwj4kgCPEb3D4fEWF1NSkvLciPW3yYp0TiksFce6yP6SioSEvuhGJ2M/sBLQJxW8Agn kZpQ== X-Gm-Message-State: ALoCoQn44jP7FAwN+y8YIyhgZtkH0oUJ8VYmZoWmJyL/KJqLuGuhw03D+MLcvePi6nmaRstDU8Pa MIME-Version: 1.0 X-Received: by 10.180.107.164 with SMTP id hd4mr7493733wib.94.1443061336640; Wed, 23 Sep 2015 19:22:16 -0700 (PDT) Received: by 10.28.212.144 with HTTP; Wed, 23 Sep 2015 19:22:16 -0700 (PDT) In-Reply-To: References: <5602DC01.2020507@t-online.de> <5602E1DE.5060303@t-online.de> <5602F31D.4070709@t-online.de> <560309A8.5050902@t-online.de> Date: Wed, 23 Sep 2015 22:22:16 -0400 Message-ID: Subject: Re: PDFBOX 2 scanned documents From: Tim Daley To: users@pdfbox.apache.org Content-Type: multipart/alternative; boundary=e89a8f3ba0855c43d5052074e6dd --e89a8f3ba0855c43d5052074e6dd Content-Type: text/plain; charset=UTF-8 Another note. When I was using PDFDebugger, I could get it to load a URL, but I when I tried the file selection menu, all the PDF files were grayed out. I'm on OSX 10.10. It's not a biggie for me as I can just use a URL. On Wed, Sep 23, 2015 at 10:11 PM, Tim Daley wrote: > That's what I was looking for!!! > > Thanks! > > On Wed, Sep 23, 2015 at 4:20 PM, Tilman Hausherr > wrote: > >> Am 23.09.2015 um 22:16 schrieb Tim Daley: >> >>> The main gist of the program is to read in a multi-page pdf. Based on a >>> control file, detect what type of document this represents: >>> >>> simplex/duplex >>> portrait/landscape >>> 1-up/2-up (in the case of books/booklets) >>> >>> In the case of books/booklets, the pages need to be split in half and the >>> individual pages reordered so that they are in page number order. >>> >>> The resulting pages are then rotated as necessary and ouput either as is >>> (for use on a tablet), or arranged on letter sized pages. In this latter >>> case, the pages are moved to the upper right for simplex printing or >>> alternately upper right and upper left for duplex printing. >>> >>> I assume the easiest way would be to build a single image, rotate the >>> image >>> as necessary, resize the image as necessary and write the image to a new >>> page. If so, what operations are required to assemble all the source >>> images >>> into a single image. Or is there an easier way to do this with PDFBox? >>> >> >> Ah, I think I understand: you look at the images in the resources, and >> based on the width / height ratio, you make a decision. >> However there's no guarantee that the images will come in a certain >> sequence. >> >> Why not simply render the PDF pages? >> >> >> http://stackoverflow.com/questions/23326562/apache-pdfbox-convert-pdf-to-images >> >> >> Tilman >> >> >> >>> On Wed, Sep 23, 2015 at 4:01 PM, Tim Daley wrote: >>> >>> It's on https://www.daley.ws/Believe.pdf >>>> >>>> I found PDFDebugger, thanks! >>>> >>>> Now that I look at my code again. It looks like I am reading a list of >>>> images. >>>> >>>> On Wed, Sep 23, 2015 at 2:44 PM, Tilman Hausherr >>> > >>>> wrote: >>>> >>>> Am 23.09.2015 um 20:39 schrieb Tim Daley: >>>>> >>>>> Whoops! I don't see PDFDebugger in PDFBox 2. Oversight? I'll get it out >>>>>> of >>>>>> Version 1. >>>>>> >>>>>> You can't attach PDF files. Upload them somewhere. >>>>> >>>>> PDFDebugger is there, I even made a change earlier today! It is part of >>>>> the PDFBox app jar. >>>>> >>>>> Tilman >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Wed, Sep 23, 2015 at 2:35 PM, Tim Daley wrote: >>>>>> >>>>>> The PDF is at the bottom of the email. Aha! PDFDebugger! >>>>>> >>>>>>> On Wed, Sep 23, 2015 at 1:31 PM, Tilman Hausherr < >>>>>>> THausherr@t-online.de >>>>>>> wrote: >>>>>>> >>>>>>> The XObjects should be the same count in version 1 and 2. >>>>>>> >>>>>>>> If you don't want to share the PDFs, then look at them with the new >>>>>>>> PDFDebugger. You can see the XObject images easily. >>>>>>>> >>>>>>>> Tilman >>>>>>>> >>>>>>>> Am 23.09.2015 um 19:21 schrieb Tim Daley: >>>>>>>> >>>>>>>> Here's the basic code that used to work. Granted, it probably >>>>>>>> depends >>>>>>>> >>>>>>>>> heavily on Version 1's structure. >>>>>>>>> >>>>>>>>> >>>>>>>>> PDPage pdPage = CFCAPDFInputProgressBar.this.pdPages.get(i); >>>>>>>>> >>>>>>>>> Map images = new TreeMap(); >>>>>>>>> >>>>>>>>> PDResources pdResources = pdPage.getResources(); >>>>>>>>> >>>>>>>>> for(Entry objectImageEntry:images.entrySet()) >>>>>>>>> >>>>>>>>> { >>>>>>>>> >>>>>>>>> PDXObject pdXObject = objectImageEntry.getValue(); >>>>>>>>> >>>>>>>>> if (pdXObject instanceof PDImageXObject) >>>>>>>>> >>>>>>>>> { >>>>>>>>> >>>>>>>>> PDImageXObject pdXObjectImage= ((PDImageXObject)pdXObject); >>>>>>>>> >>>>>>>>> BufferedImage bufferedImage = null; >>>>>>>>> >>>>>>>>> try{bufferedImage= pdXObjectImage.getImage();} >>>>>>>>> >>>>>>>>> catch(Throwable t) >>>>>>>>> >>>>>>>>> { >>>>>>>>> >>>>>>>>> t.printStackTrace(); >>>>>>>>> >>>>>>>>> randomAccessFile.close(); >>>>>>>>> >>>>>>>>> throw new RuntimeException(t); >>>>>>>>> >>>>>>>>> } >>>>>>>>> >>>>>>>>> if (CFCAPDFInputProgressBar.this.music.getLandscape()) >>>>>>>>> >>>>>>>>> bufferedImage= rotate90DX(bufferedImage); >>>>>>>>> >>>>>>>>> int width = bufferedImage.getWidth(); >>>>>>>>> >>>>>>>>> int height = bufferedImage.getHeight(); >>>>>>>>> >>>>>>>>> if (CFCAPDFInputProgressBar.this.music.getTwoPage()) >>>>>>>>> >>>>>>>>> { >>>>>>>>> >>>>>>>>> width /= 2; >>>>>>>>> >>>>>>>>> boolean even = i%2 == 0; >>>>>>>>> >>>>>>>>> intrightPageNo= even?i+1:pageCount*2-i; >>>>>>>>> >>>>>>>>> intleftPageNo= even?pageCount*2-i:i+1; >>>>>>>>> >>>>>>>>> putPage(bufferedImage, rightPageNo, width, 0, width, >>>>>>>>> height); >>>>>>>>> >>>>>>>>> putPage(bufferedImage, leftPageNo, 0, 0, width, height); >>>>>>>>> >>>>>>>>> } >>>>>>>>> >>>>>>>>> else >>>>>>>>> >>>>>>>>> { >>>>>>>>> >>>>>>>>> int pageNo = CFCAPDFInputProgressBar.this.music.getStart() >>>>>>>>> + i; >>>>>>>>> >>>>>>>>> putPage(bufferedImage, pageNo, 0, 0, width, height); >>>>>>>>> >>>>>>>>> } >>>>>>>>> >>>>>>>>> } >>>>>>>>> >>>>>>>>> } >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Sep 23, 2015 at 1:06 PM, Tilman Hausherr < >>>>>>>>> THausherr@t-online.de >>>>>>>>> > wrote: >>>>>>>>> >>>>>>>>> Am 23.09.2015 um 17:33 schrieb Tim Daley: >>>>>>>>> >>>>>>>>> It appears that PDFBOX 2 handles scanned documents >>>>>>>>> differently >>>>>>>>> than PDFBOX >>>>>>>>> 1. >>>>>>>>> >>>>>>>>> I have multipage PDFs that I have scanned from a >>>>>>>>> Konica/Minolta C224e. The >>>>>>>>> PDFs in version 1 seemed to come in as a single image. >>>>>>>>> Now in >>>>>>>>> version 2, >>>>>>>>> they seem to come in as multiple images. I assume this >>>>>>>>> is to >>>>>>>>> reduce the >>>>>>>>> size of the resultant PDFs. >>>>>>>>> >>>>>>>>> Is there a way to retrieve each page as a single image >>>>>>>>> or is >>>>>>>>> there a method >>>>>>>>> to merge all the images on a page into a single image? >>>>>>>>> >>>>>>>>> >>>>>>>>> Can't comment without having a sample PDF. And I don't know >>>>>>>>> what >>>>>>>>> you mean with "seemed to come in as a single image". >>>>>>>>> >>>>>>>>> Tilman >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> --------------------------------------------------------------------- >>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org >>>>>>>>> >>>>>>>>> For additional commands, e-mail: >>>>>>>>> users-help@pdfbox.apache.org >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> *Tim Daley* >>>>>>>>> IT Specialist-Operating Systems >>>>>>>>> cru | Engagement & Services | Platform Team >>>>>>>>> o:407-826-2911 | m:407-716-0284 >>>>>>>>> tim.daley@cru.org >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> --------------------------------------------------------------------- >>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org >>>>>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>> *Tim Daley* >>>>>>> IT Specialist-Operating Systems >>>>>>> cru | Engagement & Services | Platform Team >>>>>>> o: 407-826-2911 | m: 407-716-0284 >>>>>>> tim.daley@cru.org >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org >>>>> For additional commands, e-mail: users-help@pdfbox.apache.org >>>>> >>>>> >>>>> >>>> -- >>>> *Tim Daley* >>>> IT Specialist-Operating Systems >>>> cru | Engagement & Services | Platform Team >>>> o: 407-826-2911 | m: 407-716-0284 >>>> tim.daley@cru.org >>>> >>>> >>>> >>>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org >> For additional commands, e-mail: users-help@pdfbox.apache.org >> >> > > > -- > *Tim Daley* > IT Specialist-Operating Systems > cru | Engagement & Services | Platform Team > o: 407-826-2911 | m: 407-716-0284 > tim.daley@cru.org > > > -- *Tim Daley* IT Specialist-Operating Systems cru | Engagement & Services | Platform Team o: 407-826-2911 | m: 407-716-0284 tim.daley@cru.org --e89a8f3ba0855c43d5052074e6dd--