pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aalok Agrawal <aal...@gmail.com>
Subject Re: How to extract pdf content from a html page
Date Thu, 24 Aug 2017 17:27:30 GMT
I have written following code -

PDFTextStripper pdfStripper = null;
PDDocument pdDoc = null;
COSDocument cosDoc = null;
String parsedText = null;

URL url = new URL(strURL);
BufferedInputStream file = new BufferedInputStream(url.openStream());
PDFParser parser = new PDFParser(file);

parser.parse();
cosDoc = parser.getDocument();
pdfStripper = new PDFTextStripper();

pdDoc = new PDDocument(cosDoc);
parsedText = pdfStripper.getText(pdDoc);

But it is not fetching content of pdf embedded in browser.

On Thu, Aug 24, 2017 at 9:08 PM, Gilad Denneboom <gilad.denneboom@gmail.com>
wrote:

> If you don't know the file's URL or the path of the local temp file to
> which it is saved I don't see how you could do it.
>
> On Thu, Aug 24, 2017 at 4:08 PM, Aalok Agrawal <aaloka@gmail.com> wrote:
>
> > Hi,
> >
> > I am working on an application where pdf is getting rendered in browser.
> > There is no pdf extension in URL.
> >
> > I have to read the content of the pdf & check some text. Is there any way
> > to do that.
> >
> > Thanks
> > Aalok Agrawal
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message