pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkühler (JIRA) <j...@apache.org>
Subject [jira] Updated: (PDFBOX-349) Spaces between words ignored in scanned pdf files
Date Tue, 18 Aug 2009 16:23:15 GMT

     [ https://issues.apache.org/jira/browse/PDFBOX-349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andreas Lehmkühler updated PDFBOX-349:
--------------------------------------

    Fix Version/s: 0.8.0-incubator

> Spaces between words ignored in scanned pdf files
> -------------------------------------------------
>
>                 Key: PDFBOX-349
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-349
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>            Reporter: Jukka Zitting
>             Fix For: 0.8.0-incubator
>
>         Attachments: SpacingFix.zip, UpdatedSpacingRegressionFiles.zip
>
>
> [Issue from SourceForge]
> http://sourceforge.net/tracker/index.php?func=detail&aid=1922502&group_id=78314&atid=552832
> I am using PDF-Box-0.7.3.dll with C# and have tested extraction on two
> searchable pdfs that I have scanned in from paper. Spaces between words are
> ignored for both files. I have also tested another pdf file (which I
> downloaded from the internet) and it was parsed correctly. Unfortunately,
> the file is 1.2MB and the upload was blocked. Please send me an email
> (gkobzeff@hotmail.com) and I will reply back with the file.
> Thanks for looking into this.
> Greg
> [Comment on SourceForge]
> Date: 2008-03-23 21:24
> Sender: gkobzeff
> Logged In: YES 
> user_id=2042611
> Originator: YES
> I have scanned the file into a smaller file size. I have attached the
> file.
> Thanks
> File Added: Advanced Pain Mgmt BW.pdf
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&file_id=271548&aid=1922502

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message