pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Natalia Gómez García <natalia.gmz.gar...@gmail.com>
Subject Problems with Java PDFBox
Date Sun, 09 Sep 2012 09:13:10 GMT
Hello,

I am a computer science student and I'm using your library PDFBox in Java
to extract text data from some pdf files.

In this project, I am having difficulties extracting the text from this
pdf: http://www.escet.urjc.es/alumnos/horarios/GR_Biologia_2012-13.pdf.
Specifically, I can't get to extract the text "Semana del 3 al 7 de
Septiembre de 2012".

Why can this be happening? Could you please give me some directions on how
to extract this data?

The code I'm using right now is the following:
pdfDoc = PDDocument.load(url);
pdfStripper = new PDFTextStripper();
texto=pdfStripper.getText(pdfDoc);
pdfDoc.close();

Thanks for your attention
Natalia

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message