poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Helmut Ziegler" <scruffyt...@gmx.net>
Subject Re: Can POIFS convert PDF to OLE
Date Thu, 24 Jul 2008 13:40:44 GMT
Hi Nick,

thanks for your response!
I didn't use POIFSViewer but I know (now) the structure of my PDF Ole Object. Unfortunately
this isn't enough ...

Here is what I did:

First of all I created a Word2003 xml file with Word and imported a pdf file. The PDF is recognized
as a package (not as a pdf file) as there wasn't a program to handle pdf files on that computer.
These are the important parts:
<w:binData w:name="oledata.mso">

<o:OLEObject Type="Embed" ProgID="Package" ShapeID="_x0000_i1025" DrawAspect="Content"

In the word xml file the ole object is base64 encoded.
I decoded it and wrote a binary file (OleObject.bin) that I inspected (first with 7-zip, later
with POIFS)

The structure of OleObject.bin is the following
+ Root entry
++ _1277043057

Ole10Native represents my pdf with a custom header that word attached.
To get to this content I had to:
1. Create a POIFSFilesSystem based on OleObject.bin
2. Get the Entry "_1277043057" and write it to the hard disk (as "_1277043057").
3. Strip the first 4 Bytes of "_1277043057"
4. Use the inflate algortithm to decompress it as "_1277043057_decompressed"
5. Create a POIFSFileSystem again based on the decompressed "_1277043057_decompressed")
6. Write the contents listed above to the hard disk. 
==>I could then open my PDF file. 

So far, so good. Now I tried it vice versa. After packaging the content again and tried to
open the file in Word, Word complained that it can't open the file because
"The server application, the source file, or the element wasn't found"  (this is only a translation)

The I was looking for the step that that fails.
Steps 1 to 4 worked also in the other direction but creating "_1277043057_decompressed" seemed
not to  work.
When I compared the to original "_1277043057_decompressed" to the generated one there are
many similarities (file size and most of the content). But in first part of the file original
there is more information.
I had a look at it in a text editor. The information is some kind of metadata:
1. The alphabet
2. The structure of the ole object. "R.o.o.t. .E.n.t.r.y .... O.l.e. ... C.o.m.p.O.b.j...."
3. The kind of ole object "P.a.c.k.a.g.e"

Does anyone know how I get this information into my file?


P. S. The reverse enineering is based on this excellent article

-------- Original-Nachricht --------
> Datum: Thu, 24 Jul 2008 11:42:10 +0100 (BST)
> Von: Nick Burch <nick@torchbox.com>
> An: POI Users List <user@poi.apache.org>
> Betreff: Re: Can POIFS convert PDF to OLE

> On Thu, 24 Jul 2008, Helmut Ziegler wrote:
> > Actually the Word document should also carry other documents like other 
> > word files.
> I'd suggest dumping out the stream(s), and looking at them with things 
> like org.apache.poi.poifs.dev.POIFSViewer
> Start by seeing if you can change on bit of one file in the poifs stream, 
> and have the change noticed. If that works, but adding a new poifs stream 
> doesn't, then there are extra things in the poifs stream that need to be 
> set up. I think you're probably going to need to run diff quite a bit, 
> across two files (one that works, one that doesn't) and see what's 
> different
> Nick
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org

Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten 
Browser-Versionen downloaden: http://www.gmx.net/de/go/browser

To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org

View raw message