pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting (JIRA)" <j...@apache.org>
Subject [jira] Created: (PDFBOX-349) Spaces between words ignored in scanned pdf files
Date Mon, 04 Aug 2008 17:52:44 GMT
Spaces between words ignored in scanned pdf files

                 Key: PDFBOX-349
                 URL: https://issues.apache.org/jira/browse/PDFBOX-349
             Project: PDFBox
          Issue Type: Bug
          Components: Text extraction
            Reporter: Jukka Zitting

[Issue from SourceForge]

I am using PDF-Box-0.7.3.dll with C# and have tested extraction on two
searchable pdfs that I have scanned in from paper. Spaces between words are
ignored for both files. I have also tested another pdf file (which I
downloaded from the internet) and it was parsed correctly. Unfortunately,
the file is 1.2MB and the upload was blocked. Please send me an email
(gkobzeff@hotmail.com) and I will reply back with the file.

Thanks for looking into this.


[Comment on SourceForge]
Date: 2008-03-23 21:24
Sender: gkobzeff
Logged In: YES 
Originator: YES

I have scanned the file into a smaller file size. I have attached the

File Added: Advanced Pain Mgmt BW.pdf

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message