Return-Path: Delivered-To: apmail-incubator-pdfbox-dev-archive@minotaur.apache.org Received: (qmail 61880 invoked from network); 3 Jul 2009 11:34:02 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 3 Jul 2009 11:34:02 -0000 Received: (qmail 24007 invoked by uid 500); 3 Jul 2009 11:34:13 -0000 Delivered-To: apmail-incubator-pdfbox-dev-archive@incubator.apache.org Received: (qmail 23958 invoked by uid 500); 3 Jul 2009 11:34:12 -0000 Mailing-List: contact pdfbox-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: pdfbox-dev@incubator.apache.org Delivered-To: mailing list pdfbox-dev@incubator.apache.org Received: (qmail 23864 invoked by uid 99); 3 Jul 2009 11:34:12 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Jul 2009 11:34:12 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Jul 2009 11:34:09 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 0A59C234C04B for ; Fri, 3 Jul 2009 04:33:48 -0700 (PDT) Message-ID: <1662620545.1246620828041.JavaMail.jira@brutus> Date: Fri, 3 Jul 2009 04:33:48 -0700 (PDT) From: "Jonck van der Kogel (JIRA)" To: pdfbox-dev@incubator.apache.org Subject: [jira] Commented: (PDFBOX-139) The CMapParser does not recognize essential cmap operators MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/PDFBOX-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726906#action_12726906 ] Jonck van der Kogel commented on PDFBOX-139: -------------------------------------------- Does anyone know of a work-around for the time being? This bug is really annoying. > The CMapParser does not recognize essential cmap operators > ---------------------------------------------------------- > > Key: PDFBOX-139 > URL: https://issues.apache.org/jira/browse/PDFBOX-139 > Project: PDFBox > Issue Type: Bug > Components: Parsing > > [imported from SourceForge] > http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1438028 > Originally submitted by vdimchev on 2006-02-24 03:48. > The bug is directly related to the following bug I > discovered in the database: > [ 1208652 ] PDFTextStripper.writeText Exception:Unknown > encoding for .. > I'll try to exlain it again here and supply enough > resources for its fix. > The problem is that the current implementation of > CMapParser class supports only the beginbfchar and > beginbfrange operators. > This is not enough and causes the invokation to > PDFTextStripper.writeText() to throw IOException with > the following message: Unknown encoding for 'Identity- > V'. > I also managed to produce the message: "Unknown > encoding for '90ms-RKSJ-H'. > The complete stacktrace is: > java.io.IOException: Unknown encoding for 'Identity-V' > at org.pdfbox.encoding.EncodingManager. > getEncoding(EncodingManager.java:83) > at org.pdfbox.pdmodel.font.PDFont. > getEncoding(PDFont.java:627) > at org.pdfbox.pdmodel.font.PDFont. > encode(PDFont.java:476) > at org.pdfbox.util.PDFStreamEngine. > showString(PDFStreamEngine.java:332) > at org.pdfbox.util.operator.ShowText. > process(ShowText.java:66) > at org.pdfbox.util.PDFStreamEngine. > processOperator(PDFStreamEngine.java:494) > at org.pdfbox.util.PDFStreamEngine. > processSubStream(PDFStreamEngine.java:207) > at org.pdfbox.util.PDFStreamEngine. > processStream(PDFStreamEngine.java:160) > at org.pdfbox.util.PDFTextStripper. > processPage(PDFTextStripper.java:355) > at org.pdfbox.util.PDFTextStripper. > processPages(PDFTextStripper.java:268) > at org.pdfbox.util.PDFTextStripper. > writeText(PDFTextStripper.java:220) > In fact the cause of this exception is that the > CMapParser does not recognize the begincidchar and > begincidrange operators (in the case of the 90ms-RKSJ- > H) encoding and usecmap operator in the case of > Identity-V encoding. > The cmap files for these encodings are not properly > parsed and the corresponding Cmap objects do not > contain neither one nor two byte mappings, further the > lookup() method returns null. > I'll attach two samples for the 90ms-RKSJ-H encoding > and one for the Identity-V encoding. > I'll attach cmap reference also. > [attachment on SourceForge] > http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&aid=1438028&file_id=168711 > 5014.CIDFont_Spec.rar (application/octet-stream), 240282 bytes > Reference, containing CMAP description > [attachment on SourceForge] > http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&aid=1438028&file_id=168709 > ken1.pdf (application/pdf), 33713 bytes > The Identity-V sample > [attachment on SourceForge] > http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&aid=1438028&file_id=168708 > tp0404-2a.pdf (application/pdf), 11434 bytes > The second 90ms-RKSJ-H sample > [attachment on SourceForge] > http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&aid=1438028&file_id=168705 > nan_youkou.pdf (application/pdf), 7663 bytes > The first 90ms-RKSJ-H sample -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.