pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkühler (JIRA) <j...@apache.org>
Subject [jira] Commented: (PDFBOX-420) Japanese Characters are garbled.
Date Thu, 12 Feb 2009 13:23:59 GMT

    [ https://issues.apache.org/jira/browse/PDFBOX-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672967#action_12672967
] 

Andreas Lehmkühler commented on PDFBOX-420:
-------------------------------------------

I have two questions before I try to add your code to the trunk:

1. Your patch contains a package with 4 new classes. All of them have an old pdfbox license
header. If I add this changes to pdfbox, we have to change the license to the Apache License
2.0. [1]. Is that ok for you and the author Pin Xue who is mentioned in 2 of these files?

2. Is the cmapSubstitutions mapping in PDFont complete or do you only add the mappings you
are interested in? I asked, because if I add the code, I'd like to use a complete mapping.
As far as I understand the CharCode2Unicode mapping there are some unicode files missing in
your mapping, e.g. the korean files.


[1] http://www.apache.org/licenses/LICENSE-2.0

> Japanese Characters are garbled.
> --------------------------------
>
>                 Key: PDFBOX-420
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-420
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 0.8.0-incubator
>            Reporter: Takashi Komatsubara
>            Priority: Critical
>         Attachments: supportJapanese-fontbox.patch, supportJapanese.patch, TestFilesForJapaneseGarbledIssue.zip
>
>
> The extracted Japanese characters are completely garbled.
> This issue is very critical for Japanese users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message