poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erik Sundin" <sundin.m...@gmail.com>
Subject Extracting embedding objects in Word Documents
Date Fri, 25 Jan 2008 07:27:34 GMT

I am using POI to "unpack" a Word Document which has embedded objects like
other Word Documents and pictures. The embedded objects can be in several
layers, by that I mean that an embedded Word Doucument can also have an
embedded document and so on.

My intention is to extract all these objects to a flat structure. I have
suceeded to do so by first using POIFSFileSystem to get an image of the
original document, I then get the DirectoryEntry "ObjectPool" and recurively
look for entries like "WordDocument", "WorkBook" and "PowerPoint Document".
If I find one of these I create a new POIFSFileSystem and copy the whole
structure from the original embedded object and write it to disk.

All objects get extracted ok, though it seems that embedded objects with
another embedded object gets damaged in the process. If I open an extracted
"layer-n" (n>1) document which has another document embedded I cannot open
the embedded document. Word just gives me an error saying it can't find the

Am I missing some records I need to copy from the original document which
are not located in the ObjectPool?

I'm thankful for all responses.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message