Mailing-List: contact dev-help@pdfbox.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@pdfbox.apache.org
Date: Thu, 2 Jan 2014 17:10:08 +0000 (UTC)
From: "Chris Bamford (JIRA)" <jira@apache.org>
To: dev@pdfbox.apache.org
Message-ID: <JIRA.12629714.1359475826606.31543.1388682608618@arcas>
In-Reply-To: <JIRA.12629714.1359475826606@arcas>
References: <JIRA.12629714.1359475826606@arcas>
Subject: [jira] [Commented] (PDFBOX-1502) Not Extracting Text from PDF
 Document
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/PDFBOX-1502?page=3Dcom.atlassia=
n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D138=
60341#comment-13860341 ]=20

Chris Bamford commented on PDFBOX-1502:
---------------------------------------

Hi Andreas

I'm puzzled as to why this issue was closed.  It says you were unable to re=
produce the fault Deepak described - does this mean that you were able to e=
xtract the tokens "C23445", "Mimecast", "Fred" and "Box" from the *edited* =
version of the PDF (Renewal_Advice_Edited.pdf)?
If so, that's fantastic - please advise how you did it as we have no luck (=
our current version of PDFBox is 1.8.2).

Thanks so much.

- Chris

> Not Extracting Text from PDF Document
> -------------------------------------
>
>                 Key: PDFBOX-1502
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1502
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 0.8.0-incubator, 1.7.1, 1.8.0
>         Environment: Mac OS , jdk 1.7
>            Reporter: deepak
>            Assignee: Andreas Lehmk=C3=BChler
>         Attachments: PDFBOX1502-RenewalAdvice.txt, Renewal Advice .pdf, R=
enewal_Advice_Edited.pdf, Renewal_Advice_Edited_Extracted_Text.txt
>
>
> PDDocument  document =3D PDDocument.load(Inputstream);
> PDFTextStripper stripper =3D new PDFTextStripper();
> stripper.getText(document)   is not returning some text content in the at=
tached PDF Document . It is just returning the form fields but the values a=
re empty .  The bug is reproducible both in 1.8.0-Snapshot and 1.7.1 codeba=
se.
> Please help in resolving the issue


--
This message was sent by Atlassian JIRA
(v6.1.5#6160)