pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Cwik <ja...@connecticinc.com>
Subject Embedding images with PDJpeg
Date Mon, 13 Feb 2012 14:57:44 GMT
Hi All,

I'm using pdfbox 1.6 to generate PDF files.  These text files contain some
simple text and JPEG images.  The JPEGs are small (~157x200), representing
thumbnails of other documents.

The problem is, only about half of my images display.  The rest have a
blank box where the image should be.  Also, if I run the viewer like
pdfedit or evince from the command line, you see errors:

jason@butters:~/Desktop$ evince msg4.pdf
Error: Could not find start of jpeg data
Error: Could not find start of jpeg data
Error: Could not find start of jpeg data
Error: Could not find start of jpeg data
Error: Could not find start of jpeg data


Looking at PDJpeg, it looks like it reads in my JPEG to a BufferedImage,
and then recompresses it to the stream.  The problem is (I think), that if
you look at the PDF spec it seems that the stream should really be just the
raw DCT data.  However, when you look at the PDFs generated by PDFBox, I
see the JPEG headers (e.g. 0xff, ... "JFIF") in the stream.  It seems like
the PDF viewers are being lenient and trying to find the DCT data, but
giving up on some of my images.

Does this sound correct??

Thanks,
Jason

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message