pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Big Donkeys <big.donk...@yahoo.com>
Subject Can't extract text Adobe-WinCharSetFFFF-UCS2
Date Thu, 19 Jul 2012 20:09:01 GMT
Hi, I&#39;m having some troubles extracting text from some South Korean PDF files using
PDFTextStripper.  When I try I get a "severe error could not parse predefined CMAP file for
&#39;Adobe-WinCharSetFFFF-UCS2&#39;" message and then gives me some gibberish.  File
opens and displays fine in Adobe reader.   I&#39;m using pdfbox-app-1.7.0.jar.

Here is a link to an example PDF that gives me trouble:

http://eng.khoa.go.kr/inc/func/fileDownloadBlob_nori.asp?cmsCd=CM0237&ntNo=626&fNo=4

Any ideas?  


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message