pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hesham G." <heshamgne...@gmail.com>
Subject Spaces are ignored when reading a PDF file
Date Thu, 17 Mar 2016 06:12:05 GMT
Hello ,

I have a PDF file created using Latex. I am trying to read and print all letters in that file
using PDFBox, but when doing this all spaces in that file are ignored. Here is the code I
am using:
PDPage page = (PDPage)allPages.get( 0 );
PDStream contents = page.getContents();
if ( contents != null ) {
    PDFTextStripperProcessor pdfTextStripperProcessor = new PDFTextStripperProcessor();
    pdfTextStripperProcessor.processStream( page, page.findResources(), contents.getStream()
);
}

public class PDFTextStripperProcessor extends PDFTextStripper {
    @Override
    public void processTextPosition( TextPosition text )  {
        System.out.println( text.getCharacter() );
    }
}

And you can check a one page file sample here to test it:
https://dl.dropboxusercontent.com/u/10111483/downloads/pdfbox/pdf_latex_spaces_ignored.pdf

What is the cause of this issue please?


Best regards ,
Hesham
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message