pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting (JIRA)" <j...@apache.org>
Subject [jira] Created: (PDFBOX-349) Spaces between words ignored in scanned pdf files
Date Mon, 04 Aug 2008 17:52:44 GMT
Spaces between words ignored in scanned pdf files
-------------------------------------------------

                 Key: PDFBOX-349
                 URL: https://issues.apache.org/jira/browse/PDFBOX-349
             Project: PDFBox
          Issue Type: Bug
          Components: Text extraction
            Reporter: Jukka Zitting


[Issue from SourceForge]
http://sourceforge.net/tracker/index.php?func=detail&aid=1922502&group_id=78314&atid=552832

I am using PDF-Box-0.7.3.dll with C# and have tested extraction on two
searchable pdfs that I have scanned in from paper. Spaces between words are
ignored for both files. I have also tested another pdf file (which I
downloaded from the internet) and it was parsed correctly. Unfortunately,
the file is 1.2MB and the upload was blocked. Please send me an email
(gkobzeff@hotmail.com) and I will reply back with the file.

Thanks for looking into this.

Greg

[Comment on SourceForge]
Date: 2008-03-23 21:24
Sender: gkobzeff
Logged In: YES 
user_id=2042611
Originator: YES

I have scanned the file into a smaller file size. I have attached the
file.

Thanks
File Added: Advanced Pain Mgmt BW.pdf
http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&file_id=271548&aid=1922502

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message