pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Hewson (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (PDFBOX-3152) NullPointerException in PDType1Font.encode() with centered dot
Date Wed, 16 Dec 2015 01:24:46 GMT

    [ https://issues.apache.org/jira/browse/PDFBOX-3152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059276#comment-15059276
] 

John Hewson edited comment on PDFBOX-3152 at 12/16/15 1:23 AM:
---------------------------------------------------------------

Yes, I mention something similar above: PDType1Font.HELVETICA uses WinAnsiEncoding, which
does not provide the centered dot character. If you want to use that character you'll have
to embed a font instead. I recommend using PDType0Font.load for this.

Also note that Unicode code point 127 (U+007F) is not "bullet" - it's [DELETE|http://www.fileformat.info/info/unicode/char/007F/index.htm],
which is a control code. font.toUnicode accepts a (font/PDF-specific) _character code_ and
returns a _unicode code point_.

The exception about 8-bit code points is a bit odd: this is a limitation of PDFBox and not
related to the fonts or PDF (i.e. you should be able to encode U+2022 just fine in WinAnsiEncoding).
But if you're interested in using Unicode then I'd recommend staying away from Type 1 fonts
and using PDType0Font instead.


was (Author: jahewson):
Yes, I mention something similar above: PDType1Font.HELVETICA uses WinAnsiEncoding, which
does not provide the centered dot character. If you want to use that character you'll have
to embed a font instead. I recommend using PDType0Font.load for this.

Also note that Unicode code point 127 (U+007F) is not "bullet" - it's [DELETE|http://www.fileformat.info/info/unicode/char/007F/index.htm],
which is a control code. font.toUnicode accepts a (font/PDF-specific) _character code_ and
returns a _code point_.

The exception about 8-bit code points is a bit odd: this is a limitation of PDFBox and not
related to the fonts or PDF (i.e. you should be able to encode U+2022 just fine in WinAnsiEncoding).
But if you're interested in using Unicode then I'd recommend staying away from Type 1 fonts
and using PDType0Font instead.

> NullPointerException in PDType1Font.encode() with centered dot
> --------------------------------------------------------------
>
>                 Key: PDFBOX-3152
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3152
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>         Environment: 2.0.0 RC2
>            Reporter: Tilman Hausherr
>            Assignee: John Hewson
>             Fix For: 2.0.0
>
>
> As reported by Pedro L.M. in the user mailing list
> Font used: 
> PDType1Font.HELVETICA
> code:{code}content.showText("sol┬Ělicitud");{code}
> {code}
> Exception in thread "main" java.lang.NullPointerException
> 	at org.apache.pdfbox.pdmodel.font.PDType1Font.encode(PDType1Font.java:357)
> 	at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:285)
> 	at org.apache.pdfbox.pdmodel.PDPageContentStream.showText(PDPageContentStream.java:482)
> {code}
> That character (B7) is listed as "middot" and "periodcentered" in {{glyphlist.txt}},
and "middot" is returned. But "middot" doesn't exist in the inverted list:
> {code}
> {Udieresis=220, tilde=152, greater=62, Ograve=210, bracketright=93, eth=240, otilde=245,
asciitilde=126, odieresis=246, Agrave=192, ograve=242, parenright=41, brokenbar=166, bracketleft=91,
Idieresis=207, copyright=169, aring=229, Egrave=200, Aring=197, trademark=153, idieresis=239,
D=68, E=69, F=70, G=71, A=65, B=66, C=67, L=76, M=77, N=78, O=79, florin=131, H=72, Acircumflex=194,
I=73, J=74, six=54, K=75, currency=164, U=85, T=84, W=87, adieresis=228, V=86, Q=81, P=80,
S=83, R=82, quotesingle=39, Y=89, Eth=208, X=88, braceright=125, Z=90, f=102, g=103, d=100,
Ccedilla=199, e=101, b=98, c=99, a=97, exclamdown=161, thorn=254, n=110, perthousand=137,
o=111, divide=247, acircumflex=226, l=108, m=109, j=106, Odieresis=214, k=107, h=104, i=105,
w=119, v=118, Ugrave=217, u=117, igrave=236, t=116, s=115, r=114, yen=165, q=113, p=112, Yacute=221,
quotedblleft=147, z=122, y=121, bullet=127, threequarters=190, edieresis=235, x=120, atilde=227,
percent=37, Atilde=195, exclam=33, braceleft=123, numbersign=35, ntilde=241, endash=150, scaron=154,
comma=44, threesuperior=179, grave=96, period=46, AE=198, ordfeminine=170, Otilde=213, hyphen=45,
logicalnot=172, ocircumflex=244, colon=58, acute=180, backslash=92, ordmasculine=186, four=52,
question=63, Scaron=138, cent=162, Oacute=211, agrave=224, five=53, at=64, onehalf=189, equal=61,
multiply=215, Ucircumflex=219, oslash=248, asciicircum=94, Icircumflex=206, seven=55, semicolon=59,
registered=174, cedilla=184, ae=230, macron=175, paragraph=182, guilsinglright=155, nine=57,
dagger=134, ellipsis=133, zero=48, oacute=243, circumflex=136, oe=156, uacute=250, guillemotright=187,
ccedilla=231, sterling=163, plus=43, onequarter=188, germandbls=223, Adieresis=196, questiondown=191,
Thorn=222, Oslash=216, bar=124, Uacute=218, onesuperior=185, guilsinglleft=139, space=32,
Zcaron=142, Euro=128, yacute=253, quotedbl=34, quotesinglbase=130, eacute=233, OE=140, ugrave=249,
dollar=36, dieresis=168, ydieresis=255, Edieresis=203, slash=47, egrave=232, mu=181, twosuperior=178,
ucircumflex=251, ampersand=38, degree=176, Eacute=201, udieresis=252, Igrave=204, zcaron=158,
eight=56, quotedblright=148, daggerdbl=135, icircumflex=238, quoteright=146, three=51, quotedblbase=132,
underscore=95, Ecircumflex=202, ecircumflex=234, periodcentered=183, iacute=237, guillemotleft=171,
emdash=151, one=49, Iacute=205, Aacute=193, less=60, Ocircumflex=212, asterisk=42, parenleft=40,
Ntilde=209, section=167, two=50, Ydieresis=159, plusminus=177, quoteleft=145, aacute=225}
> {code}
> "middot" isn't mentioned anywhere in the PDF specification, but "periodcentered" is.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Mime
View raw message