Return-Path: X-Original-To: apmail-pdfbox-dev-archive@www.apache.org Delivered-To: apmail-pdfbox-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EDBFF9A61 for ; Wed, 7 Mar 2012 08:16:20 +0000 (UTC) Received: (qmail 27272 invoked by uid 500); 7 Mar 2012 08:16:20 -0000 Delivered-To: apmail-pdfbox-dev-archive@pdfbox.apache.org Received: (qmail 27206 invoked by uid 500); 7 Mar 2012 08:16:19 -0000 Mailing-List: contact dev-help@pdfbox.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@pdfbox.apache.org Delivered-To: mailing list dev@pdfbox.apache.org Received: (qmail 27171 invoked by uid 99); 7 Mar 2012 08:16:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Mar 2012 08:16:18 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of eric.leleu.dev@gmail.com designates 74.125.82.42 as permitted sender) Received: from [74.125.82.42] (HELO mail-ww0-f42.google.com) (74.125.82.42) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Mar 2012 08:16:10 +0000 Received: by wgbds11 with SMTP id ds11so345469wgb.3 for ; Wed, 07 Mar 2012 00:15:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=sKzgQesPX0y+LCtmx7m0h/45ogbWR3gkcErtOiaW8T0=; b=i6QG+3gaJz9z/iCK1nY8MRK+tXt2yQOdGESUlYM5cAu3APlxqXVnXsn0MVdIMrDc98 KZm7SLLftwKU3u/jft0uUrnU9VzfD53J3JPqSbnvN8vl3Jkl5kB04MeX/1m/vrbLvx+Z coFpQ8RtG6ZnS+Xu9W6VAdpJgM8JKWUx3AWSiys2BZqfJjh46F5R9Kv/k8pxDEiKiF5P d1babUGgnrjQJdrrhLzg33Y1mV8DHjpU3YJUUKDFvSRuOy5LqfyvmftjgNwvalWqW4wF WRqnFeYyvBdq0KIc7XHd/PgOUu1TfynSofROigTM8oliFFMV2M82z/iy6sxCMiKxwgpI hI7A== MIME-Version: 1.0 Received: by 10.180.87.100 with SMTP id w4mr1938918wiz.22.1331108150087; Wed, 07 Mar 2012 00:15:50 -0800 (PST) Received: by 10.223.120.66 with HTTP; Wed, 7 Mar 2012 00:15:50 -0800 (PST) Date: Wed, 7 Mar 2012 09:15:50 +0100 Message-ID: Subject: Questions about toUnicode Cmap From: Leleu Eric To: dev@pdfbox.apache.org Content-Type: multipart/alternative; boundary=f46d0444033271920104baa2c494 X-Virus-Checked: Checked by ClamAV on apache.org --f46d0444033271920104baa2c494 Content-Type: text/plain; charset=ISO-8859-1 Hi all, I'm currently working on the preflight issue PDFBOX-1236 [1] The error seems to come from the management of the "toUnicode" CMap in a Type0 font. The "toUnicode" CMap overrides the "Encoding" CMap of the font. Due to this behaviour, the preflight validator receives the unicode value for each character code present in a Text operator instead of the CID value present in the Encoding CMap. So I have two questions : - Is the "Encoding overriding" the right thing to do ? - Why the "toUnicode" Cmap is used to display text? According to my understanding of the PDF References v1.7, the toUnicode CMap is used to extract Text from a PDF File and to create a text file with unicode characters. To display the text on a PDFReader, the font content and the Encoding Cmap seem enough. What is your point of view about these two points? BR, Eric [1] https://issues.apache.org/jira/browse/PDFBOX-1236 --f46d0444033271920104baa2c494--