pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Miran Damjanovic <MI...@statoil.com>
Subject PDFBox Parsing problem - EOF
Date Wed, 10 Apr 2013 08:56:43 GMT

I have been using PDFBox to get text from PDF's and validate some of it. Recently I have had
Problems parsing the PDF's, more precisely I get an java.io.ioexception. I use the following
To get the text from PDF:
public String getTextFromPDF(URL url, int  readTimeout, int connectTimeout) throws IOException
            try {
                  //open connection
                  HttpURLConnection conn =  (HttpURLConnection) url.openConnection();

                  //set caching to false
                  conn.setUseCaches( false );

                  //set read timeout
                  conn.setReadTimeout( readTimeout );

                  //set connect timeout
                  conn.setConnectTimeout( connectTimeout );

                  //get input stream from connection
                  InputStream fileToParse = conn.getInputStream();

                  System.out.println( fileToParse.toString());

                  //parser object
                  PDFParser parser = new PDFParser(fileToParse, null, true);

                  //do parse

                  //get document
                  PDDocument pdoc = parser.getPDDocument();

                  //get stripper object
                  PDFTextStripper stripper = new PDFTextStripper();

                  //get text
                  String text = stripper.getText( pdoc );

                  //close doc


                  //reset connection (set to nothing)
                  conn = null;

                  //reset file
                  fileToParse = null;

                  //reset parser
                  parser = null;

                  //return content
                  return text;

The error message I get is this (line 51 is where I call parser.parse() above):

I appreciate any tips and help you can provide, in advance many thank you
Miran Damjanovic

The information contained in this message may be CONFIDENTIAL and is
intended for the addressee only. Any unauthorised use, dissemination of the
information or copying of this message is prohibited. If you are not the
addressee, please notify the sender immediately by return e-mail and delete
this message.
Thank you

  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message