lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Forrest PDF non-Latin-1 support [was: RE: prototype Solr 1.3 RC 1]
Date Thu, 04 Sep 2008 16:31:25 GMT
Thank you Steve!  See, I knew you'd nail it.  I don't want to complicate lives of others just
because of one little diacritic.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Steven A Rowe <sarowe@syr.edu>
> To: solr-dev@lucene.apache.org
> Sent: Friday, August 29, 2008 5:57:31 PM
> Subject: Forrest PDF non-Latin-1 support [was: RE: prototype Solr 1.3 RC 1]
> 
> On 08/29/2008 at 3:24 PM, Chris Hostetter wrote:
> > I suspect the PDF formatter just doesn't play nicely with the
> > non-trivial UTF-8 characters.
> 
> This is an Apache FOP FAQ; from 
> :
> 
>    6.2. Some characters are not displayed, or displayed
>         incorrectly, or displayed as "#".
> 
>    This usually means the selected font doesn't have a
>    glyph for the character.
> 
>    The standard text fonts supplied with Acrobat Reader have
>    mostly glyphs for characters from the ISO Latin 1 character
>    set. [...]
> 
>    If you use your own fonts, the font must have a glyph for the
>    desired character. Furthermore the font must be available on
>    the machine where the PDF is viewed or it must have been
>    embedded in the PDF file. [...]
> 
> There's an open Forrest bug for this problem: 
> , and the discussion there 
> includes a link to the Cocoon documentation for embedding fonts in PDF files: 
> .
> 
> This looks kinda complicated, and AFAICT would require modifications to the 
> Forrest installation wherever the site is built.
> 
> I suspect that almost nobody looks at the PDF version of the "Who we are" page 
> (and I sure am sorry now that I brought this up...)
> 
> If things are left as-is, Otis's last name would be displayed properly in the 
> HTML, and garbled in the PDF; if the diacritic is removed, then it will be 
> displayed improperly in both places :)
> 
> Steve


Mime
View raw message