Return-Path: X-Original-To: apmail-pdfbox-users-archive@www.apache.org Delivered-To: apmail-pdfbox-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2DCB1C6B9 for ; Fri, 18 May 2012 05:38:24 +0000 (UTC) Received: (qmail 96715 invoked by uid 500); 18 May 2012 05:38:24 -0000 Delivered-To: apmail-pdfbox-users-archive@pdfbox.apache.org Received: (qmail 96558 invoked by uid 500); 18 May 2012 05:38:21 -0000 Mailing-List: contact users-help@pdfbox.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@pdfbox.apache.org Delivered-To: mailing list users@pdfbox.apache.org Received: (qmail 96531 invoked by uid 99); 18 May 2012 05:38:21 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 May 2012 05:38:21 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [71.30.219.252] (HELO barracuda.midway.edu) (71.30.219.252) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 May 2012 05:38:13 +0000 X-ASG-Debug-ID: 1337319472-047e760259569080001-ulvbhl Received: from MCE101.midway.edu ([10.1.1.31]) by barracuda.midway.edu with ESMTP id czQdPS33hpxmkXi4 for ; Fri, 18 May 2012 01:37:52 -0400 (EDT) X-Barracuda-Envelope-From: thawkins@midway.edu Received: from MCE104.midway.edu ([10.1.1.34]) by MCE101.midway.edu ([::1]) with mapi id 14.01.0355.002; Fri, 18 May 2012 01:37:52 -0400 From: "Hawkins, Thomas A. - Student" To: "users@pdfbox.apache.org" Subject: RE: PDFBox and superscript format .NET Thread-Topic: PDFBox and superscript format .NET X-ASG-Orig-Subj: RE: PDFBox and superscript format .NET Thread-Index: Ac00tCkpola8NBhKRwOczUooSCz0+QAA+ZZv Date: Fri, 18 May 2012 05:37:51 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [76.177.159.103] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Barracuda-Connect: UNKNOWN[10.1.1.31] X-Barracuda-Start-Time: 1337319472 X-Barracuda-URL: http://10.1.1.21:8000/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at midway.edu X-Barracuda-Spam-Score: 0.00 X-Barracuda-Spam-Status: No, SCORE=0.00 using global scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=5.0 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.97278 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- As an addendum, I didn't realize when I sent this out - the numbers are a c= ombination of regular and superscript, since email won't support it, mathem= atical operators it is. The numbers should be=0A= 8^5 (INSTEAD OF 85)=0A= 9^6 (INSTEAD OF 96)=0A= 4^7 (INSTEAD OF 47)=0A= 10^4 (INSTEAD OF 104)=0A= ________________________________________=0A= From: Hawkins, Thomas A. - Student [thawkins@midway.edu]=0A= Sent: Friday, May 18, 2012 1:21 AM=0A= To: users@pdfbox.apache.org=0A= Subject: PDFBox and superscript format .NET=0A= =0A= I am using the .NET version of PDFBox and I have a pdf that contains data s= uch as this:=0A= =0A= Name Location=0A= Jim Daviees 85=0A= Herschel Walker 96=0A= Vince Gogh 47=0A= Andrew Lincoln 104=0A= =0A= I need both the name value and the location value. When I use the following= code:=0A= =0A= Dim p As PDDocument =3D PDDocument.load(fi.FullName)=0A= Dim r As PDFTextStripper =3D New PDFTextStripper=0A= =0A= Dim stringVal As String =3D r.getText(p)=0A= Dim bytes As Byte() =3D System.Text.Encoding.ASCII.GetB= ytes(stringVal)=0A= =0A= I get the following in the .txt file (also in html when I've converted it t= o that)=0A= Jim Daviees=0A= Herschel Walker=0A= Vince Gogh=0A= Andrew Lincoln=0A= 85=0A= 96=0A= 47=0A= 104=0A= =0A= I'm okay with the layout, as I've got a work around for that, my problem is= that it destroys any mention of the superscript exponents. Is there a way = that I can locate these superscript parts and encapsulate them in brackets = or something so as the returned value is more like this:=0A= Jim Daviees=0A= Herschel Walker=0A= Vince Gogh=0A= Andrew Lincoln=0A= 8[5]=0A= 9[6]=0A= 4[7]=0A= 10[4]=0A= =0A= So, nutshell time. Can I use pdfbox (.NET Version) to locate the instances = of superscript in a pdf file (like locating in html) and change= it out for an easily recognized symbol to be output to my destination file= . I picked brackets because I have no brackets in my source file whatsoever= and they would be very easy for me to code around. Thanks in advance.=0A=