poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitry Goldenberg <DGoldenb...@attivio.com>
Subject Re: How to extract embedded files from Office 07
Date Fri, 29 Aug 2008 12:24:16 GMT
Nick,
That's what I figured by looking at a couple of hexviews last nite. I'll add that to my code.

Thanks for your help.

- Dmitry

----- Original Message -----
From: Nick Burch <nick@torchbox.com>
To: POI Users List <user@poi.apache.org>
Sent: Fri Aug 29 05:43:13 2008
Subject: RE: How to extract embedded files from Office 07

On Thu, 28 Aug 2008, Dmitry Goldenberg wrote:
> You were right, the .bin file in /embeddings is Ole and can be read with
> POIFS.

It looks like there's three files within the poifs stream:
   Ole <(0x01)Ole>
   CompObj <(0x01)CompObj>
   Ole10Native <(0x01)Ole10Native>

> The gotcha is, there's currently no API to extract the file out of the
> Ole structures within POIFS.

It should be a five minute job - grab the poifs entry, get the bytes, and
write them to a FileOutputStream. Probably 15-20 minutes including unit
tests and overloaded methods :)


The slight snag will be that the Ole10Native entry isn't quite what you
want. It contains the file name, the absolute file name, a little bit more
bumpf, then your real file data after that. A little bit of work will be
needed to figure out how to tell where the real file data starts, but then
you'd be away!

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org

Mime
View raw message