Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 41B602009A8 for ; Tue, 17 May 2016 09:37:57 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 3ED081609F5; Tue, 17 May 2016 07:37:57 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 887571609AE for ; Tue, 17 May 2016 09:37:56 +0200 (CEST) Received: (qmail 21015 invoked by uid 500); 17 May 2016 07:37:50 -0000 Mailing-List: contact users-help@pdfbox.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@pdfbox.apache.org Delivered-To: mailing list users@pdfbox.apache.org Received: (qmail 21000 invoked by uid 99); 17 May 2016 07:37:50 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 May 2016 07:37:50 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id F0289C0E16 for ; Tue, 17 May 2016 07:37:49 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.427 X-Spam-Level: X-Spam-Status: No, score=-0.427 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, RP_MATCHES_RCVD=-1.426] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id KCRcJzAKmNQa for ; Tue, 17 May 2016 07:37:48 +0000 (UTC) Received: from mailout01.t-online.de (mailout01.t-online.de [194.25.134.80]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 4627E5F239 for ; Tue, 17 May 2016 07:37:48 +0000 (UTC) Received: from fwd10.aul.t-online.de (fwd10.aul.t-online.de [172.20.26.152]) by mailout01.t-online.de (Postfix) with SMTP id 89487421514C for ; Tue, 17 May 2016 09:37:42 +0200 (CEST) Received: from [192.168.2.104] (Vr44weZJYhsmcrG9HsxG1dTtTHBy4YO9bdfWLwQaTT8ptGxjc+2fvpKZ4fqSLeWZ34@[217.231.142.15]) by fwd10.t-online.de with (TLSv1.2:ECDHE-RSA-AES256-SHA encrypted) esmtp id 1b2ZZU-0hHHEm0; Tue, 17 May 2016 09:37:40 +0200 Subject: Re: Overlay 2 files partially To: users@pdfbox.apache.org References: From: Tilman Hausherr Message-ID: Date: Tue, 17 May 2016 09:37:42 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-ID: Vr44weZJYhsmcrG9HsxG1dTtTHBy4YO9bdfWLwQaTT8ptGxjc+2fvpKZ4fqSLeWZ34 X-TOI-MSGID: 9b6b019b-a1d8-492c-9fe8-d1ffd93b52aa archived-at: Tue, 17 May 2016 07:37:57 -0000 Am 17.05.2016 um 04:20 schrieb Romain Guillaume: > Hi everyone, > > I would like to overlay 2 pdf files but with particular modifications. > I know how to overlay 2 pdf but sometimes I need to remove some elements of > one of them during overlay operation. > For example, imagine an invoice composed with 2 files: > -one is the background page (containing logo or others fixed graphical > elements) > -one is the text page (containing amounts, dates, invoice number, ...) > As you probably guessed, to obtain final invoice I overlay this 2 pages > (and it works perfectly in 99.99% of cases) > That brings me to my problem. Sometimes the "text page" is somewhat > "dirty". I mean there are some text areas with a white background instead > of a transparent background. So when I do the overlay, I see on final > invoice, white areas which overwrite background page (it covers some > graphical elements and it should not). > So my question is how to say during overlay operation: "don't keep elements > which are white backgrounds, or replace them by transparent backgrounds". I > don't know how parse each element of the pdf and say if this element is a > white background don't keep it. > My question is not "how to keep text only". I want remove only white > backgrounds (or replace them by transparent backgrounds) and keep all > others elements (all images, all texts, all backgrounds which are not > white, ...) > I use pdfbox 1.8.11 > > I thank you in advance for your help. > Tricky, even if you can share the file. You should look at the file with the PDFDebugger app (2.x is better). Then find path operators like m, l and re in the content stream of a page. Then the color assignment for the non stroking color (could be s, sc, scn, k, g, rg) and or insert a transparent graphics state parameter and restore it later. Get the token list, change it, and rewrite it into a new content stream. It might be even more tricky if a PDF uses forms (new elements with their own content stream). Or if the colors are so that one can't easily tell what is white. However paths are also used to draw lines, boxes, etc that you don't want to remove. And what about invoice that come with a background image? Or a logo? Or an image that is actually a bunch of vector graphics? This is a terrible assignment, maybe the result of a poor business decision. Tilman --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org For additional commands, e-mail: users-help@pdfbox.apache.org