poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hong-Thai Nguyen <Hong-Thai.Ngu...@polyspot.com>
Subject Extract thumbnail of MS Office ?
Date Fri, 21 Feb 2014 10:28:28 GMT
Hi all,

I'm trying extract thumbnail of a MS Word document using HPSF (this file has embedded thumbnail).
After doc : http://poi.apache.org/hpsf/thumbnails.html, I can do with follow code :
static byte[] process(File docFile) throws Exception {
    final HWPFDocumentCore wordDocument = AbstractWordUtils.loadDoc(docFile);
    SummaryInformation summaryInformation = wordDocument.getSummaryInformation();
    System.out.println(summaryInformation.getAuthor());
    System.out.println(summaryInformation.getApplicationName() + ":" + summaryInformation.getTitle());
    Thumbnail thumbnail = new Thumbnail(summaryInformation.getThumbnail());
    System.out.println(thumbnail.getClipboardFormat());
    System.out.println(thumbnail.getClipboardFormatTag());
    return thumbnail.getThumbnailAsWMF();
  }

Unfornatly, the extraction raises exception :
Converting E:\test.doc
Saving output to E:\test.wmf
org.apache.poi.hpsf.HPSFException: Clipboard Format Tag of Thumbnail must be CFTAG_WINDOWS.
       at org.apache.poi.hpsf.Thumbnail.getClipboardFormat(Thumbnail.java:234)
       at DOC2JPG.process(DOC2JPG.java:52)
       at DOC2JPG.main(DOC2JPG.java:33)
Michel ARNOULD
Microsoft Word 9.0:GROUPE DE PAIRS DE VILLIERS-ST-GEORGES

I exported content from summaryInformation.getThumbnail() to a file, then show by Hexa. The
4 bytes value of Clipboard format tag is never -1 (CFTAG_WINDOWS), but a '4294967295' :
18 33 00 00 FF FF FF FF 03 00 00 00 08 00 05 52
01 74 E2 18 01 00 09 00 00 03 7C 19 00 00 0A 00
...

I tested on some other Word documents, the format tag value is always '4294967295'.

Thank alot for your help.

Hong-Thai


Mime
View raw message