pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonck van der Kogel (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PDFBOX-139) The CMapParser does not recognize essential cmap operators
Date Fri, 03 Jul 2009 11:33:48 GMT

    [ https://issues.apache.org/jira/browse/PDFBOX-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726906#action_12726906
] 

Jonck van der Kogel commented on PDFBOX-139:
--------------------------------------------

Does anyone know of a work-around for the time being? This bug is really annoying.

> The CMapParser does not recognize essential cmap operators
> ----------------------------------------------------------
>
>                 Key: PDFBOX-139
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-139
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1438028
> Originally submitted by vdimchev on 2006-02-24 03:48.
> The bug is directly related to the following bug I 
> discovered in the database:
> [ 1208652 ] PDFTextStripper.writeText Exception:Unknown 
> encoding for ..
> I'll try to exlain it again here and supply enough 
> resources for its fix.
> The problem is that the current implementation of 
> CMapParser class supports only the beginbfchar and 
> beginbfrange operators.
> This is not enough and causes the invokation to 
> PDFTextStripper.writeText() to throw IOException with 
> the following message: Unknown encoding for 'Identity-
> V'.
> I also managed to produce the message: "Unknown 
> encoding for '90ms-RKSJ-H'.
> The complete stacktrace is:
> java.io.IOException: Unknown encoding for 'Identity-V'
>         at org.pdfbox.encoding.EncodingManager.
> getEncoding(EncodingManager.java:83)
>         at org.pdfbox.pdmodel.font.PDFont.
> getEncoding(PDFont.java:627)
>         at org.pdfbox.pdmodel.font.PDFont.
> encode(PDFont.java:476)
>         at org.pdfbox.util.PDFStreamEngine.
> showString(PDFStreamEngine.java:332)
>         at org.pdfbox.util.operator.ShowText.
> process(ShowText.java:66)
>         at org.pdfbox.util.PDFStreamEngine.
> processOperator(PDFStreamEngine.java:494)
>         at org.pdfbox.util.PDFStreamEngine.
> processSubStream(PDFStreamEngine.java:207)
>         at org.pdfbox.util.PDFStreamEngine.
> processStream(PDFStreamEngine.java:160)
>         at org.pdfbox.util.PDFTextStripper.
> processPage(PDFTextStripper.java:355)
>         at org.pdfbox.util.PDFTextStripper.
> processPages(PDFTextStripper.java:268)
>         at org.pdfbox.util.PDFTextStripper.
> writeText(PDFTextStripper.java:220)
> In fact the cause of this exception is that the 
> CMapParser does not recognize the begincidchar and 
> begincidrange operators (in the case of the 90ms-RKSJ-
> H) encoding and usecmap operator in the case of 
> Identity-V encoding.
> The cmap files for these encodings are not properly 
> parsed and the corresponding Cmap objects do not 
> contain neither one nor two byte mappings, further the 
> lookup() method returns null.
> I'll attach two samples for the 90ms-RKSJ-H encoding 
> and one for the Identity-V encoding.
> I'll attach cmap reference also.
> [attachment on SourceForge]
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&aid=1438028&file_id=168711
> 5014.CIDFont_Spec.rar (application/octet-stream), 240282 bytes
> Reference, containing CMAP description
> [attachment on SourceForge]
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&aid=1438028&file_id=168709
> ken1.pdf (application/pdf), 33713 bytes
> The Identity-V sample
> [attachment on SourceForge]
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&aid=1438028&file_id=168708
> tp0404-2a.pdf (application/pdf), 11434 bytes
> The second 90ms-RKSJ-H sample
> [attachment on SourceForge]
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&aid=1438028&file_id=168705
> nan_youkou.pdf (application/pdf), 7663 bytes
> The first 90ms-RKSJ-H sample

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message