poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yury Batrakov" <batra...@gmail.com>
Subject Re: Processing word documents recursively
Date Thu, 27 Mar 2008 15:47:01 GMT
Hello Nick and all!

I've started implementing a feature to process embedded OLE2 documents.
I've slightly modified HWPFDocument(POIFSFileSystem pfilesystem) to
accept OLE directory name to look all desired streams there. An idea
is to open document, get POIFSFileSystem for it, then get  OLE
directory that contains embedded doc and feed it to constructor.  I
try me changes, but there the following exception have occured when
constructor was executed:

Directory: ObjectPool\_1262435934
Exception in thread "main" java.io.IOException: The text piece table
is corrupted
	at org.apache.poi.hwpf.model.ComplexFileTable.<init>(ComplexFileTable.java:53)
	at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:316)
	at ru.mera.ofa.ReadOLE1.main(ReadOLE1.java:28)

Please, help me to deal with this!

My patches for poi-src-3.0.2-FINAL-20080204 attached (poi-3.0.2-FINAL.patch)
sample code (ReadOLE1.java) and word file (word2003.doc) are in attachment too.

On 3/3/08, Nick Burch <nick@torchbox.com> wrote:
>  Looking at your code, I think I can see why. You open the POIFSFileSystem,
>  find interesting looking streams, then try to open these. Unfortunately,
>  word files are made up of a few different streams, and the streams don't
>  have the outer poifs header on them.
>  I'd suggest that when you find a WordDocument stream, and then identify
>  its matching table entry, you knock up a new POIFSFileSystem for them.
>  Give that the two streams, then call WordExtractor on that
>  See the HWPFDocument(POIFSFileSystem pfilesystem) constructor for details
>  of what the streams need to be called
>  Nick
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>  For additional commands, e-mail: user-help@poi.apache.org

View raw message