Return-Path: X-Original-To: apmail-pdfbox-users-archive@www.apache.org Delivered-To: apmail-pdfbox-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 58DF6188C9 for ; Fri, 25 Sep 2015 11:02:57 +0000 (UTC) Received: (qmail 13092 invoked by uid 500); 25 Sep 2015 11:02:52 -0000 Delivered-To: apmail-pdfbox-users-archive@pdfbox.apache.org Received: (qmail 13070 invoked by uid 500); 25 Sep 2015 11:02:52 -0000 Mailing-List: contact users-help@pdfbox.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@pdfbox.apache.org Delivered-To: mailing list users@pdfbox.apache.org Received: (qmail 13058 invoked by uid 99); 25 Sep 2015 11:02:51 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Sep 2015 11:02:51 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 375C5C0E02 for ; Fri, 25 Sep 2015 11:02:51 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.901 X-Spam-Level: ** X-Spam-Status: No, score=2.901 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=cru.org Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id tUgdBDscmFhq for ; Fri, 25 Sep 2015 11:02:37 +0000 (UTC) Received: from mail-wi0-f173.google.com (mail-wi0-f173.google.com [209.85.212.173]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 4F0FB202AF for ; Fri, 25 Sep 2015 11:02:37 +0000 (UTC) Received: by wiclk2 with SMTP id lk2so14616595wic.1 for ; Fri, 25 Sep 2015 04:02:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cru.org; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=w+Z4cOA/Wt9eeeXzu2c9G04kU9RjYXc5X+U2DEt8lg4=; b=SPW4Ft2xyLTfB5YsZJWSsoe8A5uaSrc8I1AlCgtLgngG3Je5JAYSB47ZQJhL10rWzc Y5tbH+KNrNM4fiL8Vty4QMcBA6ygmfRDnnUkNLfIlqt25sw+flaYuLRbkdKYro/BxaY+ 0YDmhHj/fOOrItxwNvEzraKA38ypUCFMdyMcw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=w+Z4cOA/Wt9eeeXzu2c9G04kU9RjYXc5X+U2DEt8lg4=; b=elz5RuN1jYrzDXl8Ufk9xj8PHtLLQS5yCxJaBgOlv1U0IxB21DP2GBaXp08gfF1TT3 bdYElNnPcA0DvSO/PQq4uvg2bKBT0a8che5oBIuiZfoo4MoaYMA4Z/EUQPADqNpRqoxH oMkmLXkaqMwavUWxqioxNdv8TauOtkpSwcF03slyPd2bqx3LihTkzoe3KQ28rLu2LDBK O/55zB+0ZGyCnBZaFNPmUW/Nz9/8CwFBYihdNuOrqhPkqk9l0D76Uw7cGvDDRBn8JTYg DCowD6Hs3qZmtYsbHaS6zXzjeoNubNA7Xr66ortxK41fPEku1fksVtOT+T2N0YEH20b1 kqhQ== X-Gm-Message-State: ALoCoQmZCzRZSpWB6GhujM1JiePtMl0Mmms2VfZq8ekt0jOv95eBlCo6JRn+LethS7mQ9gevqzGg MIME-Version: 1.0 X-Received: by 10.180.86.100 with SMTP id o4mr2614272wiz.59.1443178955473; Fri, 25 Sep 2015 04:02:35 -0700 (PDT) Received: by 10.28.212.144 with HTTP; Fri, 25 Sep 2015 04:02:35 -0700 (PDT) In-Reply-To: <5604D816.9090509@t-online.de> References: <5602DC01.2020507@t-online.de> <5602E1DE.5060303@t-online.de> <5602F31D.4070709@t-online.de> <560309A8.5050902@t-online.de> <560399E1.50009@t-online.de> <5604D816.9090509@t-online.de> Date: Fri, 25 Sep 2015 07:02:35 -0400 Message-ID: Subject: Re: PDFBOX 2 scanned documents From: Tim Daley To: users@pdfbox.apache.org Content-Type: multipart/alternative; boundary=f46d04428e24fd251005209048b3 --f46d04428e24fd251005209048b3 Content-Type: text/plain; charset=UTF-8 Tims-MacBook-Pro-2:Downloads tdaley$ jar -xvf pdfbox-app-2.0.0-20150925.080424-1677-sources.jar created: META-INF/ inflated: META-INF/MANIFEST.MF inflated: META-INF/DEPENDENCIES inflated: META-INF/NOTICE inflated: META-INF/LICENSE Tims-MacBook-Pro-2:Downloads tdaley$ On Fri, Sep 25, 2015 at 1:13 AM, Tilman Hausherr wrote: > Am 25.09.2015 um 01:50 schrieb Tim Daley: > >> I checked the Maven source for the package and it only contained metadata. >> I would have tried to check it out. >> > > Not sure what the question is, but the latest 2.0 app snapshot is here: > > https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.0-SNAPSHOT/ > > just download the latest jar file. > > Tilman > > > >> On Thu, Sep 24, 2015 at 2:36 AM, Tilman Hausherr >> wrote: >> >> Am 24.09.2015 um 04:22 schrieb Tim Daley: >>> >>> Another note. When I was using PDFDebugger, I could get it to load a URL, >>>> but I when I tried the file selection menu, all the PDF files were >>>> grayed >>>> out. >>>> >>>> I'm on OSX 10.10. It's not a biggie for me as I can just use a URL. >>>> >>>> Weird... It worked for me (on windows)... Another possibility is to use >>> drag and drop. >>> >>> >>> Tilman >>> >>> On Wed, Sep 23, 2015 at 10:11 PM, Tim Daley wrote: >>> >>>> That's what I was looking for!!! >>>> >>>>> Thanks! >>>>> >>>>> On Wed, Sep 23, 2015 at 4:20 PM, Tilman Hausherr < >>>>> THausherr@t-online.de> >>>>> wrote: >>>>> >>>>> Am 23.09.2015 um 22:16 schrieb Tim Daley: >>>>> >>>>>> The main gist of the program is to read in a multi-page pdf. Based on >>>>>> a >>>>>> >>>>>>> control file, detect what type of document this represents: >>>>>>> >>>>>>> simplex/duplex >>>>>>> portrait/landscape >>>>>>> 1-up/2-up (in the case of books/booklets) >>>>>>> >>>>>>> In the case of books/booklets, the pages need to be split in half and >>>>>>> the >>>>>>> individual pages reordered so that they are in page number order. >>>>>>> >>>>>>> The resulting pages are then rotated as necessary and ouput either as >>>>>>> is >>>>>>> (for use on a tablet), or arranged on letter sized pages. In this >>>>>>> latter >>>>>>> case, the pages are moved to the upper right for simplex printing or >>>>>>> alternately upper right and upper left for duplex printing. >>>>>>> >>>>>>> I assume the easiest way would be to build a single image, rotate the >>>>>>> image >>>>>>> as necessary, resize the image as necessary and write the image to a >>>>>>> new >>>>>>> page. If so, what operations are required to assemble all the source >>>>>>> images >>>>>>> into a single image. Or is there an easier way to do this with >>>>>>> PDFBox? >>>>>>> >>>>>>> Ah, I think I understand: you look at the images in the resources, >>>>>>> and >>>>>>> >>>>>> based on the width / height ratio, you make a decision. >>>>>> However there's no guarantee that the images will come in a certain >>>>>> sequence. >>>>>> >>>>>> Why not simply render the PDF pages? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> http://stackoverflow.com/questions/23326562/apache-pdfbox-convert-pdf-to-images >>>>>> >>>>>> >>>>>> Tilman >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Sep 23, 2015 at 4:01 PM, Tim Daley wrote: >>>>>> >>>>>>> It's on https://www.daley.ws/Believe.pdf >>>>>>> >>>>>>> I found PDFDebugger, thanks! >>>>>>>> >>>>>>>> Now that I look at my code again. It looks like I am reading a list >>>>>>>> of >>>>>>>> images. >>>>>>>> >>>>>>>> On Wed, Sep 23, 2015 at 2:44 PM, Tilman Hausherr < >>>>>>>> THausherr@t-online.de >>>>>>>> wrote: >>>>>>>> >>>>>>>> Am 23.09.2015 um 20:39 schrieb Tim Daley: >>>>>>>> >>>>>>>> Whoops! I don't see PDFDebugger in PDFBox 2. Oversight? I'll get it >>>>>>>>> out >>>>>>>>> >>>>>>>>> of >>>>>>>>>> Version 1. >>>>>>>>>> >>>>>>>>>> You can't attach PDF files. Upload them somewhere. >>>>>>>>>> >>>>>>>>>> PDFDebugger is there, I even made a change earlier today! It is >>>>>>>>> part >>>>>>>>> of >>>>>>>>> the PDFBox app jar. >>>>>>>>> >>>>>>>>> Tilman >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Sep 23, 2015 at 2:35 PM, Tim Daley >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> The PDF is at the bottom of the email. Aha! PDFDebugger! >>>>>>>>>> >>>>>>>>>> On Wed, Sep 23, 2015 at 1:31 PM, Tilman Hausherr < >>>>>>>>>> >>>>>>>>>>> THausherr@t-online.de >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> The XObjects should be the same count in version 1 and 2. >>>>>>>>>>> >>>>>>>>>>> If you don't want to share the PDFs, then look at them with the >>>>>>>>>>> new >>>>>>>>>>> >>>>>>>>>>>> PDFDebugger. You can see the XObject images easily. >>>>>>>>>>>> >>>>>>>>>>>> Tilman >>>>>>>>>>>> >>>>>>>>>>>> Am 23.09.2015 um 19:21 schrieb Tim Daley: >>>>>>>>>>>> >>>>>>>>>>>> Here's the basic code that used to work. Granted, it probably >>>>>>>>>>>> depends >>>>>>>>>>>> >>>>>>>>>>>> heavily on Version 1's structure. >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> PDPage pdPage = CFCAPDFInputProgressBar.this.pdPages.get(i); >>>>>>>>>>>>> >>>>>>>>>>>>> Map images = new TreeMap>>>>>>>>>>>> PDXObject>(); >>>>>>>>>>>>> >>>>>>>>>>>>> PDResources pdResources = pdPage.getResources(); >>>>>>>>>>>>> >>>>>>>>>>>>> for(Entry >>>>>>>>>>>>> objectImageEntry:images.entrySet()) >>>>>>>>>>>>> >>>>>>>>>>>>> { >>>>>>>>>>>>> >>>>>>>>>>>>> PDXObject pdXObject = objectImageEntry.getValue(); >>>>>>>>>>>>> >>>>>>>>>>>>> if (pdXObject instanceof PDImageXObject) >>>>>>>>>>>>> >>>>>>>>>>>>> { >>>>>>>>>>>>> >>>>>>>>>>>>> PDImageXObject pdXObjectImage= >>>>>>>>>>>>> ((PDImageXObject)pdXObject); >>>>>>>>>>>>> >>>>>>>>>>>>> BufferedImage bufferedImage = null; >>>>>>>>>>>>> >>>>>>>>>>>>> try{bufferedImage= pdXObjectImage.getImage();} >>>>>>>>>>>>> >>>>>>>>>>>>> catch(Throwable t) >>>>>>>>>>>>> >>>>>>>>>>>>> { >>>>>>>>>>>>> >>>>>>>>>>>>> t.printStackTrace(); >>>>>>>>>>>>> >>>>>>>>>>>>> randomAccessFile.close(); >>>>>>>>>>>>> >>>>>>>>>>>>> throw new RuntimeException(t); >>>>>>>>>>>>> >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> if (CFCAPDFInputProgressBar.this.music.getLandscape()) >>>>>>>>>>>>> >>>>>>>>>>>>> bufferedImage= rotate90DX(bufferedImage); >>>>>>>>>>>>> >>>>>>>>>>>>> int width = bufferedImage.getWidth(); >>>>>>>>>>>>> >>>>>>>>>>>>> int height = bufferedImage.getHeight(); >>>>>>>>>>>>> >>>>>>>>>>>>> if (CFCAPDFInputProgressBar.this.music.getTwoPage()) >>>>>>>>>>>>> >>>>>>>>>>>>> { >>>>>>>>>>>>> >>>>>>>>>>>>> width /= 2; >>>>>>>>>>>>> >>>>>>>>>>>>> boolean even = i%2 == 0; >>>>>>>>>>>>> >>>>>>>>>>>>> intrightPageNo= even?i+1:pageCount*2-i; >>>>>>>>>>>>> >>>>>>>>>>>>> intleftPageNo= even?pageCount*2-i:i+1; >>>>>>>>>>>>> >>>>>>>>>>>>> putPage(bufferedImage, rightPageNo, width, 0, width, >>>>>>>>>>>>> height); >>>>>>>>>>>>> >>>>>>>>>>>>> putPage(bufferedImage, leftPageNo, 0, 0, width, >>>>>>>>>>>>> height); >>>>>>>>>>>>> >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> else >>>>>>>>>>>>> >>>>>>>>>>>>> { >>>>>>>>>>>>> >>>>>>>>>>>>> int pageNo = >>>>>>>>>>>>> CFCAPDFInputProgressBar.this.music.getStart() >>>>>>>>>>>>> + i; >>>>>>>>>>>>> >>>>>>>>>>>>> putPage(bufferedImage, pageNo, 0, 0, width, height); >>>>>>>>>>>>> >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Sep 23, 2015 at 1:06 PM, Tilman Hausherr < >>>>>>>>>>>>> THausherr@t-online.de >>>>>>>>>>>>> > wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Am 23.09.2015 um 17:33 schrieb Tim Daley: >>>>>>>>>>>>> >>>>>>>>>>>>> It appears that PDFBOX 2 handles scanned documents >>>>>>>>>>>>> differently >>>>>>>>>>>>> than PDFBOX >>>>>>>>>>>>> 1. >>>>>>>>>>>>> >>>>>>>>>>>>> I have multipage PDFs that I have scanned from a >>>>>>>>>>>>> Konica/Minolta C224e. The >>>>>>>>>>>>> PDFs in version 1 seemed to come in as a single >>>>>>>>>>>>> image. >>>>>>>>>>>>> Now in >>>>>>>>>>>>> version 2, >>>>>>>>>>>>> they seem to come in as multiple images. I assume >>>>>>>>>>>>> this >>>>>>>>>>>>> is to >>>>>>>>>>>>> reduce the >>>>>>>>>>>>> size of the resultant PDFs. >>>>>>>>>>>>> >>>>>>>>>>>>> Is there a way to retrieve each page as a single >>>>>>>>>>>>> image >>>>>>>>>>>>> or is >>>>>>>>>>>>> there a method >>>>>>>>>>>>> to merge all the images on a page into a single >>>>>>>>>>>>> image? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Can't comment without having a sample PDF. And I don't >>>>>>>>>>>>> know >>>>>>>>>>>>> what >>>>>>>>>>>>> you mean with "seemed to come in as a single image". >>>>>>>>>>>>> >>>>>>>>>>>>> Tilman >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>>> To unsubscribe, e-mail: >>>>>>>>>>>>> users-unsubscribe@pdfbox.apache.org >>>>>>>>>>>>> >>>>>>>>>>>>> For additional commands, e-mail: >>>>>>>>>>>>> users-help@pdfbox.apache.org >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> *Tim Daley* >>>>>>>>>>>>> IT Specialist-Operating Systems >>>>>>>>>>>>> cru | Engagement & Services | Platform Team >>>>>>>>>>>>> o:407-826-2911 | m:407-716-0284 >>>>>>>>>>>>> tim.daley@cru.org >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org >>>>>>>>>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> >>>>>>>>>>>>> *Tim Daley* >>>>>>>>>>>> >>>>>>>>>>> IT Specialist-Operating Systems >>>>>>>>>>> cru | Engagement & Services | Platform Team >>>>>>>>>>> o: 407-826-2911 | m: 407-716-0284 >>>>>>>>>>> tim.daley@cru.org >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>> >>>>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org >>>>>>>>>> >>>>>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>> *Tim Daley* >>>>>>>> IT Specialist-Operating Systems >>>>>>>> cru | Engagement & Services | Platform Team >>>>>>>> o: 407-826-2911 | m: 407-716-0284 >>>>>>>> tim.daley@cru.org >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> >>>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org >>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>> *Tim Daley* >>>>> IT Specialist-Operating Systems >>>>> cru | Engagement & Services | Platform Team >>>>> o: 407-826-2911 | m: 407-716-0284 >>>>> tim.daley@cru.org >>>>> >>>>> >>>>> >>>>> >>>>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org >>> For additional commands, e-mail: users-help@pdfbox.apache.org >>> >>> >>> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org > For additional commands, e-mail: users-help@pdfbox.apache.org > > -- *Tim Daley* IT Specialist-Operating Systems cru | Engagement & Services | Platform Team o: 407-826-2911 | m: 407-716-0284 tim.daley@cru.org --f46d04428e24fd251005209048b3--