Return-Path: X-Original-To: apmail-pdfbox-dev-archive@www.apache.org Delivered-To: apmail-pdfbox-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4223D1038D for ; Thu, 2 Jan 2014 17:12:00 +0000 (UTC) Received: (qmail 98580 invoked by uid 500); 2 Jan 2014 17:10:45 -0000 Delivered-To: apmail-pdfbox-dev-archive@pdfbox.apache.org Received: (qmail 98277 invoked by uid 500); 2 Jan 2014 17:10:26 -0000 Mailing-List: contact dev-help@pdfbox.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@pdfbox.apache.org Delivered-To: mailing list dev@pdfbox.apache.org Received: (qmail 98007 invoked by uid 99); 2 Jan 2014 17:10:08 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Jan 2014 17:10:08 +0000 Date: Thu, 2 Jan 2014 17:10:08 +0000 (UTC) From: "Chris Bamford (JIRA)" To: dev@pdfbox.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (PDFBOX-1502) Not Extracting Text from PDF Document MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/PDFBOX-1502?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D138= 60341#comment-13860341 ]=20 Chris Bamford commented on PDFBOX-1502: --------------------------------------- Hi Andreas I'm puzzled as to why this issue was closed. It says you were unable to re= produce the fault Deepak described - does this mean that you were able to e= xtract the tokens "C23445", "Mimecast", "Fred" and "Box" from the *edited* = version of the PDF (Renewal_Advice_Edited.pdf)? If so, that's fantastic - please advise how you did it as we have no luck (= our current version of PDFBox is 1.8.2). Thanks so much. - Chris > Not Extracting Text from PDF Document > ------------------------------------- > > Key: PDFBOX-1502 > URL: https://issues.apache.org/jira/browse/PDFBOX-1502 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 0.8.0-incubator, 1.7.1, 1.8.0 > Environment: Mac OS , jdk 1.7 > Reporter: deepak > Assignee: Andreas Lehmk=C3=BChler > Attachments: PDFBOX1502-RenewalAdvice.txt, Renewal Advice .pdf, R= enewal_Advice_Edited.pdf, Renewal_Advice_Edited_Extracted_Text.txt > > > PDDocument document =3D PDDocument.load(Inputstream); > PDFTextStripper stripper =3D new PDFTextStripper(); > stripper.getText(document) is not returning some text content in the at= tached PDF Document . It is just returning the form fields but the values a= re empty . The bug is reproducible both in 1.8.0-Snapshot and 1.7.1 codeba= se. > Please help in resolving the issue -- This message was sent by Atlassian JIRA (v6.1.5#6160)