poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hong-Thai Nguyen <Hong-Thai.Ngu...@polyspot.com>
Subject RE: Extract thumbnail of MS Office ?
Date Mon, 24 Feb 2014 12:40:54 GMT
Hi,

I propably found the bug. On lines 205 and 237 of org.apache.poi.hpsf.Thumbnail :
long clipboardFormatTag = LittleEndian.getUInt(getThumbnail(),
                                                       OFFSET_CFTAG);

Must be LittleEndian.getInt()
FF FF FF FF will be intepreted as -1 as expected, not '4294967295'.

Thanks

Hong-Thai

De : Hong-Thai Nguyen
Envoyé : vendredi 21 février 2014 11:28
À : 'user@poi.apache.org'
Objet : Extract thumbnail of MS Office ?

Hi all,

I'm trying extract thumbnail of a MS Word document using HPSF (this file has embedded thumbnail).
After doc : http://poi.apache.org/hpsf/thumbnails.html, I can do with follow code :
static byte[] process(File docFile) throws Exception {
    final HWPFDocumentCore wordDocument = AbstractWordUtils.loadDoc(docFile);
    SummaryInformation summaryInformation = wordDocument.getSummaryInformation();
    System.out.println(summaryInformation.getAuthor());
    System.out.println(summaryInformation.getApplicationName() + ":" + summaryInformation.getTitle());
    Thumbnail thumbnail = new Thumbnail(summaryInformation.getThumbnail());
    System.out.println(thumbnail.getClipboardFormat());
    System.out.println(thumbnail.getClipboardFormatTag());
    return thumbnail.getThumbnailAsWMF();
  }

Unfornatly, the extraction raises exception :
Converting E:\test.doc
Saving output to E:\test.wmf
org.apache.poi.hpsf.HPSFException: Clipboard Format Tag of Thumbnail must be CFTAG_WINDOWS.
       at org.apache.poi.hpsf.Thumbnail.getClipboardFormat(Thumbnail.java:234)
       at DOC2JPG.process(DOC2JPG.java:52)
       at DOC2JPG.main(DOC2JPG.java:33)
Michel ARNOULD
Microsoft Word 9.0:GROUPE DE PAIRS DE VILLIERS-ST-GEORGES

I exported content from summaryInformation.getThumbnail() to a file, then show by Hexa. The
4 bytes value of Clipboard format tag is never -1 (CFTAG_WINDOWS), but a '4294967295' :
18 33 00 00 FF FF FF FF 03 00 00 00 08 00 05 52
01 74 E2 18 01 00 09 00 00 03 7C 19 00 00 0A 00
...

I tested on some other Word documents, the format tag value is always '4294967295'.

Thank alot for your help.

Hong-Thai


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message