poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Bamford <cbamf...@mimecast.com>
Subject Re: Extracting embedded files from HWPF docs
Date Mon, 10 Jun 2013 12:21:39 GMT
Hi again Nick,

This problem appears to be Mac-specific, I have had more luck with a .doc file created natively
in Windows :-)
Now POIFSLister shows the ObjectPool and the item in it:

Root Entry -
  SummaryInformation <(0x05)SummaryInformation> [412 / 0x19c]
  DocumentSummaryInformation <(0x05)DocumentSummaryInformation> [280 / 0x118]
  WordDocument [4142 / 0x102e]
  1Table [2087 / 0x827]
  ObjectPool -
    _1432368106 -
      CompObj <(0x01)CompObj> [76 / 0x4c]
      ObjInfo <(0x03)ObjInfo> [6 / 0x6]
      Ole10Native <(0x01)Ole10Native> [568849 / 0x8ae11]
      EPRINT <(0x03)EPRINT> [5000 / 0x1388]
  CompObj <(0x01)CompObj> [113 / 0x71]
  Data [4096 / 0x1000]

Please can you point me to any resources which could help me to save the embedded file to
another file (i.e. read all the bytes and save them somewhere)?

Thanks,

- Chris

On 10 Jun 2013, at 09:33, Chris Bamford wrote:

> Hi Nick,
> 
> I created a .doc file with an embedded MP3 (that is, I dragged an MP3 file from Finder
and dropped it into the document whereupon Word displayed a small image of a loudspeaker -
I took this as a positive sign!).
> I then added some text for good measure and saved it, taking care to save it as "Word
97 - 2004".
> Then I ran POIFSLister -sizes on it and got:
> 
> Root Entry -
>  SummaryInformation <(0x05)SummaryInformation> [4096 / 0x1000]
>  DocumentSummaryInformation <(0x05)DocumentSummaryInformation> [4096 / 0x1000]
>  WordDocument [9152 / 0x23c0]
>  1Table [7280 / 0x1c70]
>  CompObj <(0x01)CompObj> [96 / 0x60]
> 
> Looking closer in the debugger, I discovered that none of the entries shown are of type
DirectoryNode, so I cannot even start the process of finding / extracting the MP3.
> Any ideas what I might be doing wrong?
> Thanks,
> 
> - Chris
> 
> 
> Thanks Nick, must have missed that. Will check it out.
> Chris
> On 7 Jun 2013, at 14:12, Nick Burch wrote:
>> On Fri, 7 Jun 2013, Chris Bamford wrote:
>>> Is there a way to extract files embedded into Word docs (.doc, not .docx), using
the HWPF package?
>> 
>> Does the information on http://poi.apache.org/poifs/embeded.html not cover what you
need?
>> 
>> Nick
> 
> 
> 
> 
> On 7 Jun 2013, at 14:26, Chris Bamford wrote:
> 
> Thanks Nick, must have missed that. Will check it out.
> 
> Chris
> 
> On 7 Jun 2013, at 14:12, Nick Burch wrote:
> 
>> On Fri, 7 Jun 2013, Chris Bamford wrote:
>>> Is there a way to extract files embedded into Word docs (.doc, not .docx), using
the HWPF package?
>> 
>> Does the information on http://poi.apache.org/poifs/embeded.html not cover what you
need?
>> 
>> Nick
> 
> 
> Chris Bamford
> Senior Developer
> 
> CityPoint, 
> One Ropemaker Street, 
> London, 
> EC2Y 9AW.
> 
> mobile +44 7860 405292
> tel: +44 (0) 207 847 8700
> web www.mimecast.com
> 
> 
> The information contained in this communication from cbamford@mimecast.com is confidential
and may be legally privileged. It is intended solely for use by user@poi.apache.org and others
authorized to receive it. If you are not user@poi.apache.org you are hereby notified that
any disclosure, copying, distribution or taking action in reliance of the contents of this
information is strictly prohibited and may be unlawful.
> 
> 
> Mimecast Ltd. is a company registered in England and Wales with the company number 4698693
VAT No. GB 123 4197 34
> Registered Office: CityPoint, One Ropemaker Street, Moorgate, London, EC2Y 9AW Email
Address: info@mimecast.com
> 
> This email message has been scanned for viruses by Mimecast.
> Mimecast delivers a complete managed email solution from a single web based platform.
> For more information please visit http://www.mimecast.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message