Return-Path: Delivered-To: apmail-poi-dev-archive@www.apache.org Received: (qmail 57242 invoked from network); 8 Oct 2010 01:37:34 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 8 Oct 2010 01:37:34 -0000 Received: (qmail 5986 invoked by uid 500); 8 Oct 2010 01:37:34 -0000 Delivered-To: apmail-poi-dev-archive@poi.apache.org Received: (qmail 5951 invoked by uid 500); 8 Oct 2010 01:37:33 -0000 Mailing-List: contact dev-help@poi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "POI Developers List" Delivered-To: mailing list dev@poi.apache.org Received: (qmail 5943 invoked by uid 99); 8 Oct 2010 01:37:33 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Oct 2010 01:37:33 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,MIME_QP_LONG_LINE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of craig.stires@gmail.com designates 209.85.216.173 as permitted sender) Received: from [209.85.216.173] (HELO mail-qy0-f173.google.com) (209.85.216.173) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Oct 2010 01:37:24 +0000 Received: by qyk4 with SMTP id 4so1297442qyk.11 for ; Thu, 07 Oct 2010 18:37:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:from:to:references :in-reply-to:subject:date:message-id:mime-version:content-type :x-mailer:thread-index:content-language; bh=7XgSuAZzLv9JRdHrY4T/cHqUL2LjvMQHmGWArbdpIeg=; b=N580K/sqomfMhxrTjHHxQsA5UhSyz1jjFg8HlLjhIqSz5FY8jd9gKbGe8ePsS9Boy6 FXLJNmoUcHvRz2Mfr1EviDrIz2hNJahsW8+dNw6g8pshmI44g1/lKQLe3Zy+cz/V7jqz HphYiUgUxuNfmUVMrMMxrh7gFWtki872MVpD8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:references:in-reply-to:subject:date:message-id:mime-version :content-type:x-mailer:thread-index:content-language; b=Qkt2H92f/vauVB9gZ/yfB8Gx5CZ7Apkgqn6Ws4uMvKAz6u6ibrzSURJGqLgVcVL0pE JAQksqJpGXv5IP7Wt+gX6CSnqRB6Q0NKJZ475C2aVBb6LKEDMicwx1qnLMnQELhTcx3Y cQI8w1OmAjEkSJWAC0gy8LguAxIlYUzjqJe+Q= Received: by 10.229.218.11 with SMTP id ho11mr1395564qcb.251.1286501821507; Thu, 07 Oct 2010 18:37:01 -0700 (PDT) Received: from AoW ([182.52.72.130]) by mx.google.com with ESMTPS id e17sm1950595qcs.22.2010.10.07.18.36.59 (version=SSLv3 cipher=RC4-MD5); Thu, 07 Oct 2010 18:37:00 -0700 (PDT) From: "Craig Stires" To: "'POI Developers List'" References: <4cacf1cd.887b0e0a.68b7.6613@mx.google.com> <818EA872-F576-4F5B-BF20-FD4083B3EA88@jmlafferty.com> In-Reply-To: <818EA872-F576-4F5B-BF20-FD4083B3EA88@jmlafferty.com> Subject: RE: doc on hpsf thumbnails for macintosh Date: Fri, 8 Oct 2010 08:36:38 +0700 Message-ID: <4cae75bc.117fe50a.6fda.57de@mx.google.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0015_01CB66C3.F5BAFF80" X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: Actl5t3ny82Ndi37TR2j3fADZbr4CgAGAS/g Content-Language: en-au X-Virus-Checked: Checked by ClamAV on apache.org ------=_NextPart_000_0015_01CB66C3.F5BAFF80 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi David, For anyone else who is looking to pursue extracting the Mac Office thumbnails, I have been able to render them to jpg (using Graphics2D). Found an old project by Matthias Wiesmann (JavaQuickdraw, circa 1999). Got in touch with him, and he redirected me to a great set of plugins which have been released under the BSD license by Harald Kuhr, to expand the formats available for R/W by ImageIO. https://twelvemonkeys-imageio.dev.java.net/download.html Great stuff. If Rainer Klute (or the person committing on this module) is willing to adopt the changes I've recommended below, then users will have a very simple set of code to pull these thumbnails into a BufferedImage. -------------- Thumbnail thumbobj = new Thumbnail(si.getThumbnail()); if (thumbobj.getClipboardFormatTag() == CFTAG_MACINTOSH) { byte[] thumbdata = thumbobj.getThumbnailAsPICT(); BufferedImage bi = ImageIO.read(new ByteArrayInputStream(thumbdata)); } -------------- I've attached a patch with the changes initially proposed below. David, the patch does not include the 8 bytes on the header, as I am getting the BufferedImage to work without it, and to open the saved files with QTViewer. I don't have sample files that need the header change yet, but will post if I find them. Thanks, -Craig -----Original Message----- From: David Fisher [mailto:dfisher@jmlafferty.com] Sent: Thursday, 7 October 2010 1:12 PM To: POI Developers List Cc: klute@apache.org Subject: Re: doc on hpsf thumbnails for macintosh Hi Craig, Very interesting. I can vouch for most of this from a PICT generator I wrote years ago in Fortran. Other resolutions than 72 dpi are possible. My code also produces 300 dpi PICTs. It's a pretty nice drawing file format, but I would not compare it with SVG - it's from the original MAC and more like WMF. Also, you must be really aware of your raster, particularly if you are aligning characters at a small font size. The only correction I found - the 512 byte null header should have non-null content in the first 8 bytes. 6 bytes - 'PICTMD' 2 bytes - integer value [00 06] But perhaps Office doesn't care about that and the Mac's Clipboard handles that. Regards, Dave On Oct 6, 2010, at 3:01 PM, Craig Stires wrote: > > > Hi dev team, > > > > This is a bit of a long email, but I wanted to pass on the research that > I've been doing, and some recommendations for changes to the HPSF > thumbnailing API. > > > > I have needed to extract thumbnails from a set of Microsoft Office docs. > They have been produced on Windows, and on Mac. The existing > org.apache.poi.hpsf.Thumbnail class handles the Windows case (CFTAG_WINDOWS > & CF_METAFILEPICT). However, it does not handle the Macintosh case > (CFTAG_MACINTOSH & CF_MACQD). > > > > The Macintosh thumbnails are stored in QuickDraw format (extended version > 2). This is the Mac-proprietary SVG equivalent. The thumbnail has a marker > at the beginning of the clipboard data, "PICT". It needs to be replaced > with 512 null bytes. > > References: > > http://www.fileformat.info/format/macpict/egff.htm > > http://developer.apple.com/legacy/mac/library/documentation/mac/QuickDraw/Qu > ickDraw-462.html#HEADING462-0 > > > > I have managed to create readable files, after a bit of manipulation of the > clipboard data. Here is the high-level process for getting a file in a > valid format. > > > > > > Overview of extraction steps > > > > 01. Get the summary information from the document (005SummaryInformation) > > 02. Get the thumbnail object from summary information > > 03. Get the clipboard format tag from the thumbnail object > > 04. Confirm that cftag==CFTAG_MACINTOSH > > 05. Get the thumbnail data from the thumbnail object > > 06. Confirm that > substr(thumbdata,Thumbnail.OFFSET_CF,"PICT".length())=="PICT" > > 07. Create a byte array with a 512-byte x00 header > > 08. Append the byte array with substr(thumbdata, Thumbnail.OFFSET_CF + > "PICT".length(), thumbdata.length() - Thumbnail.OFFSET_CF - "PICT".length()) > > 09. Return the byte array, or write to file (extension PICT, PCT, or PIC. > mime image/x-pict) > > > > > > Specifications of the Macintosh clipboard formats > > > > 4 byte (ascii) - clipboard data format ["PICT"] > > 2 byte - picture size (byte count) > > 8 byte - bounding rectangle of picture [ x1 y1 x2 y2 ] > > 2 byte - VersionOp opcode [00 11] > > 2 byte - Version opcode [02 FF] > > 2 byte - Header opcode [0C 00] > > 24 byte - header information > > - 2 byte - picture version ( -1 = version 2 ; -2 = extended version 2 > ) > > - 2 byte - reserved (unused) [ 00 00 ] > > - 4 byte - horizontal res [ 00 48 00 00 = 72 dpi ] > > - 4 byte - vertical res [ 00 48 00 00 = 72 dpi ] > > - 8 byte - source rectangle of picture [ x1 y1 x2 y2 ] > > - 2 byte - reserved (unused) [ 00 00 ] > > - 2 byte - reserved (unused) [ 00 00 ] > > > > > > Recommendations for change to org.apache.poi.hpsf.Thumbnail > > > > public static int CF_MACQD = 15; > > public static int OFFSET_MACQDDATA = 12; > > private static String TAG_MACQD = "PICT"; > > > > public long getClipboardFormat() throws HPSFException { > > long clipboardformat = 0; > > > > if (getClipboardFormatTag() == CFTAG_WINDOWS) { > > clipboardformat = LittleEndian.getUInt(getThumbnail(), OFFSET_CF); > > } > > else if (getClipboardFormatTag() == CFTAG_MACINTOSH) { > > String cftype = new String(getThumbnail(), Thumbnail.OFFSET_CF, > TAG_MACQD.length()); > > if (cftype.matches(TAG_MACQD)) { > > clipboardformat = CF_MACQD; > > } > > else { > > throw new HPSFException("Clipboard Format Tag of Thumbnail must be > " + > > TAG_MACQD > + " for CFTAG_MACINTOSH"); > > } > > } > > else { > > throw new HPSFException("Clipboard Format Tag of Thumbnail must be " + > > > "CFTAG_WINDOWS > or CFTAG_MACINTOSH "); > > } > > return clipboardformat; > > } > > > > public byte[] getThumbnailAsPICT() throws HPSFException { > > if (!(getClipboardFormatTag() == CFTAG_MACINTOSH)) > > throw new HPSFException("Clipboard Format Tag of Thumbnail must " + > > "be > CFTAG_MACINTOSH."); > > if (!(getClipboardFormat() == CF_MACQD)) > > throw new HPSFException("Clipboard Format of Thumbnail must " + > > "be > CF_MACQD."); > > else { > > byte[] thumbnail = getThumbnail(); > > int pictImageLength = thumbnail.length - OFFSET_MACQDDATA; > > byte[] header = new byte[512]; > > for (int x=0; x < header.length; x++) { > > header[x] = 0; > > } > > byte[] pictImage = new byte[pictImageLength + header.length]; > > System.arraycopy(header, 0, pictImage, header.length); > > System.arraycopy(thumbnail, OFFSET_MACQDDATA, pictImage, > pictImageLength); > > > > return pictImage; > > } > > } > > > > > > > > All the best, > > -Craig > > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org For additional commands, e-mail: dev-help@poi.apache.org ------=_NextPart_000_0015_01CB66C3.F5BAFF80 Content-Type: text/plain; charset=us-ascii --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org For additional commands, e-mail: dev-help@poi.apache.org ------=_NextPart_000_0015_01CB66C3.F5BAFF80--