pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brian Carrier (JIRA)" <j...@apache.org>
Subject [jira] Created: (PDFBOX-430) Incorrect diacritic placement in text extraction
Date Wed, 18 Feb 2009 21:24:01 GMT
Incorrect diacritic placement in text extraction
------------------------------------------------

                 Key: PDFBOX-430
                 URL: https://issues.apache.org/jira/browse/PDFBOX-430
             Project: PDFBox
          Issue Type: Bug
            Reporter: Brian Carrier


Some PDF files store diacritics (accents over characters) as separate text elements. The PDF
files essentially have a chunk of text and then backup and place the diacritic over one of
the characters in the chunk of text. With text extraction, the current design does not allow
the diacritic to be placed over a character in the chunk and instead it is placed after the
chunk. 

The debug-diac2.pdf file in PDFBOX-429 shows this problem. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message