poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Craig Stires" <craig.sti...@gmail.com>
Subject RE: doc on hpsf thumbnails for macintosh
Date Fri, 08 Oct 2010 01:36:38 GMT

Hi David,

For anyone else who is looking to pursue extracting the Mac Office
thumbnails, I have been able to render them to jpg (using Graphics2D).

Found an old project by Matthias Wiesmann (JavaQuickdraw, circa 1999).  Got
in touch with him, and he redirected me to a great set of plugins which have
been released under the BSD license by Harald Kuhr, to expand the formats
available for R/W by ImageIO.
https://twelvemonkeys-imageio.dev.java.net/download.html

Great stuff. 

If Rainer Klute (or the person committing on this module) is willing to
adopt the changes I've recommended below, then users will have a very simple
set of code to pull these thumbnails into a BufferedImage.

--------------
Thumbnail thumbobj = new Thumbnail(si.getThumbnail());
if (thumbobj.getClipboardFormatTag() == CFTAG_MACINTOSH) {
   byte[] thumbdata = thumbobj.getThumbnailAsPICT();
   BufferedImage bi = ImageIO.read(new ByteArrayInputStream(thumbdata));
}
--------------

I've attached a patch with the changes initially proposed below.

David, the patch does not include the 8 bytes on the header, as I am getting
the BufferedImage to work without it, and to open the saved files with
QTViewer.  I don't have sample files that need the header change yet, but
will post if I find them.


Thanks,
-Craig


-----Original Message-----
From: David Fisher [mailto:dfisher@jmlafferty.com] 
Sent: Thursday, 7 October 2010 1:12 PM
To: POI Developers List
Cc: klute@apache.org
Subject: Re: doc on hpsf thumbnails for macintosh

Hi Craig,

Very interesting. I can vouch for most of this from a PICT generator I wrote
years ago in Fortran. Other resolutions than 72 dpi are possible. My code
also produces 300 dpi PICTs. It's a pretty nice drawing file format, but I
would not compare it with SVG - it's from the original MAC and more like
WMF. Also, you must be really aware of your raster, particularly if you are
aligning characters at a small font size.

The only correction I found - the 512 byte null header should have non-null
content in the first 8 bytes.

6 bytes - 'PICTMD'

2 bytes - integer value [00 06]

But perhaps Office doesn't care about that and the Mac's Clipboard handles
that.

Regards,
Dave

On Oct 6, 2010, at 3:01 PM, Craig Stires wrote:

> 
> 
> Hi dev team,
> 
> 
> 
> This is a bit of a long email, but I wanted to pass on the research that
> I've been doing, and some recommendations for changes to the HPSF
> thumbnailing API.
> 
> 
> 
> I have needed to extract thumbnails from a set of Microsoft Office docs.
> They have been produced on Windows, and on Mac.  The existing
> org.apache.poi.hpsf.Thumbnail class handles the Windows case
(CFTAG_WINDOWS
> & CF_METAFILEPICT).  However, it does not handle the Macintosh case
> (CFTAG_MACINTOSH & CF_MACQD).
> 
> 
> 
> The Macintosh thumbnails are stored in QuickDraw format (extended version
> 2).  This is the Mac-proprietary SVG equivalent.  The thumbnail has a
marker
> at the beginning of the clipboard data, "PICT".  It needs to be replaced
> with 512 null bytes.  
> 
> References:
> 
> http://www.fileformat.info/format/macpict/egff.htm
> 
>
http://developer.apple.com/legacy/mac/library/documentation/mac/QuickDraw/Qu
> ickDraw-462.html#HEADING462-0
> 
> 
> 
> I have managed to create readable files, after a bit of manipulation of
the
> clipboard data.  Here is the high-level process for getting a file in a
> valid format.
> 
> 
> 
> 
> 
> Overview of extraction steps
> 
> 
> 
> 01.  Get the summary information from the document (005SummaryInformation)
> 
> 02.  Get the thumbnail object from summary information
> 
> 03.  Get the clipboard format tag from the thumbnail object
> 
> 04.  Confirm that cftag==CFTAG_MACINTOSH
> 
> 05.  Get the thumbnail data from the thumbnail object
> 
> 06.  Confirm that
> substr(thumbdata,Thumbnail.OFFSET_CF,"PICT".length())=="PICT"
> 
> 07.  Create a byte array with a 512-byte x00 header
> 
> 08.  Append the byte array with substr(thumbdata, Thumbnail.OFFSET_CF +
> "PICT".length(), thumbdata.length() - Thumbnail.OFFSET_CF -
"PICT".length())
> 
> 09.  Return the byte array, or write to file (extension PICT, PCT, or PIC.
> mime image/x-pict)
> 
> 
> 
> 
> 
> Specifications of the Macintosh clipboard formats
> 
> 
> 
> 4 byte (ascii)  - clipboard data format ["PICT"]
> 
> 2 byte  - picture size (byte count)
> 
> 8 byte  - bounding rectangle of picture [ x1 y1  x2 y2 ]
> 
> 2 byte  - VersionOp opcode [00 11]
> 
> 2 byte  - Version opcode [02 FF]
> 
> 2 byte  - Header opcode [0C 00] 
> 
> 24 byte  - header information 
> 
>    - 2 byte  - picture version ( -1 = version 2  ;  -2 = extended version
2
> )
> 
>    - 2 byte  - reserved (unused) [ 00 00 ]
> 
>    - 4 byte  - horizontal res [ 00 48 00 00  = 72 dpi ]
> 
>    - 4 byte  - vertical res [ 00 48 00 00  = 72 dpi ]
> 
>    - 8 byte  - source rectangle of picture [ x1 y1  x2 y2 ]
> 
>    - 2 byte  - reserved (unused) [ 00 00 ]
> 
>    - 2 byte  - reserved (unused) [ 00 00 ]
> 
> 
> 
> 
> 
> Recommendations for change to org.apache.poi.hpsf.Thumbnail
> 
> 
> 
> public static int CF_MACQD = 15;
> 
> public static int OFFSET_MACQDDATA = 12;
> 
> private static String TAG_MACQD = "PICT";
> 
> 
> 
> public long getClipboardFormat() throws HPSFException {
> 
>   long clipboardformat = 0;
> 
> 
> 
>   if (getClipboardFormatTag() == CFTAG_WINDOWS) {
> 
>      clipboardformat = LittleEndian.getUInt(getThumbnail(), OFFSET_CF);
> 
>   }
> 
>   else if (getClipboardFormatTag() == CFTAG_MACINTOSH) {
> 
>      String cftype = new String(getThumbnail(), Thumbnail.OFFSET_CF,
> TAG_MACQD.length());
> 
>      if (cftype.matches(TAG_MACQD)) {
> 
>         clipboardformat = CF_MACQD;
> 
>      }
> 
>      else {
> 
>         throw new HPSFException("Clipboard Format Tag of Thumbnail must be
> " + 
> 
>                                                                  TAG_MACQD
> + " for CFTAG_MACINTOSH");
> 
>      }
> 
>   }
> 
>   else {
> 
>      throw new HPSFException("Clipboard Format Tag of Thumbnail must be "
+
> 
> 
>                                                             "CFTAG_WINDOWS
> or CFTAG_MACINTOSH ");
> 
>   }
> 
>   return clipboardformat;
> 
> }
> 
> 
> 
> public byte[] getThumbnailAsPICT() throws HPSFException {
> 
>   if (!(getClipboardFormatTag() == CFTAG_MACINTOSH))
> 
>      throw new HPSFException("Clipboard Format Tag of Thumbnail must " +
> 
>                                                             "be
> CFTAG_MACINTOSH.");
> 
>   if (!(getClipboardFormat() == CF_MACQD))
> 
>      throw new HPSFException("Clipboard Format of Thumbnail must " +
> 
>                                                             "be
> CF_MACQD.");
> 
>   else {
> 
>      byte[] thumbnail = getThumbnail();
> 
>      int pictImageLength = thumbnail.length - OFFSET_MACQDDATA;
> 
>      byte[] header = new byte[512];
> 
>      for (int x=0; x < header.length; x++) {
> 
>         header[x] = 0;
> 
>      }
> 
>      byte[] pictImage = new byte[pictImageLength + header.length];
> 
>      System.arraycopy(header, 0, pictImage, header.length);
> 
>      System.arraycopy(thumbnail, OFFSET_MACQDDATA, pictImage,
> pictImageLength);
> 
> 
> 
>      return pictImage;
> 
>   }
> 
> }
> 
> 
> 
> 
> 
> 
> 
> All the best,
> 
> -Craig
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Mime
View raw message