pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hamed Iravanchi <iravan...@gmail.com>
Subject Re: Fonts in pdf to image conversion
Date Wed, 04 Apr 2012 07:35:02 GMT
Hi,

As far as I remember, ICEpdf didn't render right to left languages
correctly.
I'm not sure thou, maybe it is fixed now.

-Hamed

On Wed, Apr 4, 2012 at 11:48 AM, Nicklas Karlsson <nickarls@gmail.com>wrote:

> Thanks for the information. I continued my search for libraries and
> stumbled on ICEpdf from ICEsoft and it works there so you could check for
> hints in their source code while improving on PDFBox ;-)
>
> On Wed, Apr 4, 2012 at 9:57 AM, Hamed Iravanchi <iravanchi@gmail.com>
> wrote:
>
> > Hi Nicklas,
> >
> > I've been working on this issue for a while.
> > Right now, PDFBox can not convert PDF files created by Open Office or
> Libre
> > Office to images correctly.
> > In my tests, PDF files created by Microsoft Word do not have this problem
> > in the latest Trunk code.
> >
> > This is due to using extracted text to render the image, rather than
> using
> > code points.
> > Andreas used to reply my emails so we could collaborate and resolve such
> > issues faster, but I haven't received any reply lately.
> > I don't know if I'm posting in the right place or not thou...
> >
> > Anyway, to fix this issue for True Type fonts (which are typically used
> in
> > your case) following things should be done by PDFBox:
> > - It should use code points for all true type fonts, instead of extracted
> > text
> > - The code points should be mapped to glyph codes using the font's CMAP
> > - Glyph codes should be used to draw text on the image.
> >
> > I just managed to fix this yesterday in my code for my sample PDF files,
> by
> > modifying the trunk code.
> > But I'm waiting for developer team to collaborate so that I can make sure
> > what I'm doing is right and doesn't break other parts in PDFBox.
> >
> > -Hamed
> >
> >
> > On Wed, Mar 28, 2012 at 11:15 AM, Nicklas Karlsson <nickarls@gmail.com
> > >wrote:
> >
> > > Hi,
> > >
> > >  I'm using the latest LibreOffice to produce a PDF and the latest
> PDFBox
> > > to extract the pages as images but I'm having some problems with the
> > fonts.
> > > If I use Times New Roman I get a
> > >
> > > org.apache.pdfbox.pdmodel.font.PDSimpleFont drawString
> > > Changing font on <test> from <Times New Roman> to the default font
> > >
> > >  If I embed some more exotic fonts in the PDF, I get a
> > >
> > > org.apache.pdfbox.util.PDFStreamEngine processOperator
> > > unsupported/disabled operation: BMC
> > > org.apache.pdfbox.util.PDFStreamEngine processOperator
> > > unsupported/disabled operation: EMC
> > > org.apache.pdfbox.util.PDFStreamEngine processOperator
> > > unsupported/disabled operation: BDC
> > > org.apache.pdfbox.pdmodel.font.PDSimpleFont drawString
> > > Changing font on <test> from <Algerian> to the default font
> > >
> > > This is all on the same machine. Is there a special trick in getting
> the
> > > fonts working?
> > >
> > > The extraction is done with something like
> > >
> > > PDDocument doc = PDDocument.load(pdf);
> > > List pages = doc.getDocumentCatalog().getAllPages();
> > > for (int i = 0; i < pages.size(); i++)
> > > {
> > > PDPage page = (PDPage) pages.get(i);
> > > pics.add(page.convertToImage());
> > > }
> > >
> > >
> > > Thanks in advance,
> > >  Nik
> > >
> > > --
> > > ---
> > > Nik
> > >
> >
>
>
>
> --
> ---
> Nik
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message