Return-Path: X-Original-To: apmail-pdfbox-commits-archive@www.apache.org Delivered-To: apmail-pdfbox-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 516211881B for ; Fri, 30 Oct 2015 10:13:41 +0000 (UTC) Received: (qmail 61812 invoked by uid 500); 30 Oct 2015 10:13:41 -0000 Delivered-To: apmail-pdfbox-commits-archive@pdfbox.apache.org Received: (qmail 61790 invoked by uid 500); 30 Oct 2015 10:13:41 -0000 Mailing-List: contact commits-help@pdfbox.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@pdfbox.apache.org Delivered-To: mailing list commits@pdfbox.apache.org Received: (qmail 61781 invoked by uid 99); 30 Oct 2015 10:13:41 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Oct 2015 10:13:41 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 8D8CF1A253A for ; Fri, 30 Oct 2015 10:13:40 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.991 X-Spam-Level: X-Spam-Status: No, score=0.991 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, T_RP_MATCHES_RCVD=-0.01, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id 4t4WhoRRmZ1O for ; Fri, 30 Oct 2015 10:13:33 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTP id 3445F23086 for ; Fri, 30 Oct 2015 10:13:33 +0000 (UTC) Received: from svn01-us-west.apache.org (svn.apache.org [10.41.0.6]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id A22C0E04C1 for ; Fri, 30 Oct 2015 10:13:32 +0000 (UTC) Received: from svn01-us-west.apache.org (localhost [127.0.0.1]) by svn01-us-west.apache.org (ASF Mail Server at svn01-us-west.apache.org) with ESMTP id 79C743A0042 for ; Fri, 30 Oct 2015 10:13:32 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: svn commit: r1711439 - /pdfbox/cmssite/branches/jekyll-migration/content/2.0/migration.md Date: Fri, 30 Oct 2015 10:13:32 -0000 To: commits@pdfbox.apache.org From: msahyoun@apache.org X-Mailer: svnmailer-1.0.9 Message-Id: <20151030101332.79C743A0042@svn01-us-west.apache.org> Author: msahyoun Date: Fri Oct 30 10:13:32 2015 New Revision: 1711439 URL: http://svn.apache.org/viewvc?rev=1711439&view=rev Log: PDFBOX-3030: correct Text Extraction information Modified: pdfbox/cmssite/branches/jekyll-migration/content/2.0/migration.md Modified: pdfbox/cmssite/branches/jekyll-migration/content/2.0/migration.md URL: http://svn.apache.org/viewvc/pdfbox/cmssite/branches/jekyll-migration/content/2.0/migration.md?rev=1711439&r1=1711438&r2=1711439&view=diff ============================================================================== --- pdfbox/cmssite/branches/jekyll-migration/content/2.0/migration.md (original) +++ pdfbox/cmssite/branches/jekyll-migration/content/2.0/migration.md Fri Oct 30 10:13:32 2015 @@ -123,8 +123,8 @@ if (job.printDialog()) { Advanced use case examples can be found in th examples package under org/apache/pdfbox/examples/printing/Printing.java ### Text Extraction -``PDFTextStripper`` no longer sets the color information in the ``PDGraphicsState ``. If you need color information for the text being processed -you can extend ``PDFTextStripper``and add the following ``Operators`` to the constructor: +In 1.8, to get the text colors, one method was to pass an expanded .properties file to the PDFStripper constructor. To achieve the same +in PDFBox 2.0 you can extend ``PDFTextStripper``and add the following ``Operators`` to the constructor: ~~~java addOperator(new SetStrokingColorSpace()); @@ -149,6 +149,8 @@ tree are now represented by the `PDNonTe With PDFBox 2.0.0 the prefered way to iterate through the fields is now ~~~java +PDAcroForm form; +... for (PDField field : form.getFieldTree()) { ... (do something)