pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel King <dk...@halogensoftware.com>
Subject RE: Supporting multiple languages, including CJK
Date Tue, 18 Oct 2016 22:08:46 GMT
Okay, that makes sense. Back to my root problem. I have a string which I want to write to a
PDF. This string contains both latin and Arabic characters. I can't use Arial Unicode MS since
we are running on a linux system and I have been told we have some licensing concerns as well
since PDFBox embeds the font within the PDF file. I have found that Noto Naskh Arabic does
support Arabic characters but it doesn't support latin characters, so my string that happens
to contain both will throw a no glyph exception when trying to print.

One idea I have seen is to attempt to use a true type collection but loading the TTC you seem
to solely reference a single font within the collection which won't allow characters from
both latin and Arabic to be printed from a single string. Yes, in theory I could split this
string but I can't be guaranteed where the latin characters exist. Maybe I'm not understanding
something correctly with how I assume TTC works.

Thanks,
Dan

-----Original Message-----
From: Andreas Lehmkuehler [mailto:andreas@lehmi.de] 
Sent: Tuesday, October 18, 2016 2:21 PM
To: users@pdfbox.apache.org
Subject: Re: Supporting multiple languages, including CJK

Am 18.10.2016 um 15:32 schrieb Daniel King:
> I'm curious why you shouldn't load fonts that are scanned in by PDFBox using org.apache.fontbox.util.autodetect.FontDirFinder
and instead reference a hard coded system directory?
As you don't know what you get when asking the FontMapper for "Arial" especially if you run
your code on different environments or OS.

You may get a simple Arial font with a limited charset, or you may get "Arial Unicode MS"
which has a wide support for non latin charsets or you may get any arial alike font.

IMHO there are to many "may" especially if you are looking for a CJK capable font.

As John already said, it's the best idea to choose the font on your own to be sure you get
what you are looking for.

BR
Andreas

>
> -----Original Message-----
> From: John Hewson [mailto:john@jahewson.com]
> Sent: Tuesday, October 18, 2016 3:09 AM
> To: users@pdfbox.apache.org
> Subject: Re: Supporting multiple languages, including CJK
>
>
>> On 12 Oct 2016, at 05:24, Daniel King <dking@halogensoftware.com> wrote:
>>
>> Hi,
>>
>> I'm attempting to write text to a PDF in situations where I need to 
>> support multiple languages on a single PDF. This may include regular 
>> latin characters as well as CJK characters. I've tried many attempts 
>> to do this and have it load the character sets from the OS without 
>> much success. The farthest I have gotten is support latin characters, 
>> some russian and I believe Vietnamese characters founds on the 
>> embedded fonts example here 
>> https://svn.apache.org/viewvc/pdfbox/trunk/examples/src/main/java/org
>> / apache/pdfbox/examples/pdmodel/EmbeddedFonts.java?view=markup
>>
>> I'm doing a similar approach from the example but I believe I'm using 
>> the FileSystemFontProvider provided by the FontMappers class by doing 
>> something such as
>>
>> TrueTypeFont ttf = FontMappers.instance().getTrueTypeFont("Arial",
>> null).getFont(); PDFont font = PDType0Font.load(signatureDocument,
>> ttf.getOriginalData());
>
> Don’t load fonts like this. Follow the approach from the EmbeddedFonts example and
load them from the filesystem.
>
>> As I mentioned I seem to be able to support the text in the EmbeddedFonts example
but can't seem to determine how I can also support CJK. I’m currently using 2.0.2 of PDFBox
but could potentially upgrade to 2.0.3 if that would help at all.
>
> If you have a font which supports CJK then PDFBox should be able to use it. I recommend
“Arial Unicode MS” as a good starting point, as it provides many more Unicode characters
than plain “Arial”. Google’s Noto fonts also provide a great selection of characters.
>
> — John
>
>> Thanks for the help,
>> Dan
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

Mime
View raw message