poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Helmut Ziegler" <scruffyt...@gmx.net>
Subject Re: Can POIFS convert PDF to OLE
Date Thu, 24 Jul 2008 18:08:46 GMT
Hi,

I didn't make a progress but know a bit more about the "upper part" (s. below)

> But in the first part of the original file there is more information.
> I had a look at it in a text editor. The information is some kind of
> metadata:
> 1. The alphabet
> 2. The structure of the ole object. "R.o.o.t. .E.n.t.r.y .... O.l.e. ...
> C.o.m.p.O.b.j...."
> 3. The kind of ole object "P.a.c.k.a.g.e"

The compound object "_1277043057_decompressed_generated" that I generated from
[3]ObjectInfo
[1]Ole10Native
[1]Ole
[1]CompObj
has a slightly different structure than the original "_1277043057_decompressed_original".
The "metadata" is just in another place. I think it's the directory structure of the compound
file and the other objects [3]OleObjectInfo, [1]Ole, [1]CompObj.

In the generated file the structure is similar to this:
1. Part (512 Byte): Header (probably addresses to the other parts, and the rest padded up)
2. Part (512 Byte): The alphabet in this form "A...B...C..."
3. Part (512 Byte): The directory structure "R.o.o.t. .E.n.t.r.y....O.b.j.I.n.f.o." (without
Ole10Native!)
4. Part (512 Byte): Unkown block (maybe the first part signals the end of the directory structure,
the rest is padded up)
5. Part (512 Byte): This seems to be the first content block, as there is the content of [1]Ole,
[3]ObjectInfo and [1]CompObj (every content part  is padded up with "00").
6. Part (512 Byte): Here again comes a directory structure, but now only with "O.l.e.1.0.N.a.t.i.v.e"
7. Part (512 Byte): Unknown block (again it may signal the end of the file structure)
8. Part (Rest of file): The content for [1]Ole10Native ==> the pdf

The structure of the file that was generated using POIFS:
1. Part (512 Byte): Header (probably addresses to the other parts, and the rest padded up)
2. Part (.... Byte): The content for [1]Ole10Native ==> the pdf
3. Part (512 Byte): The directory structure "R.o.o.t. .E.n.t.r.y....O.b.j.I.n.f.o." (without
Ole10Native)
4. Part (512 Byte): The directory structure for "O.l.e.1.0.N.a.t.i.v.e"
5. Part (512 Byte): This is the content block for the content parts [1]Ole, [3]ObjectInfo
and [1]CompObj (every content part  is padded up with "FF", in contrary to the original file).
6. Part: Unknown part (mostly padded up with FF)
7. Part: The alphabet in this form "A...B...C..."
8. Part: Unknown part (seems to be part 7 of the original)

So the main differences are:
a) the divided directory structure in the original (word generated) file
b) ole10native comes before all other objects and even the directory structure in the POIFS
generated file
c) content parts are normally padded up with 00 in the original file and FF in the POIFS generated

Maybe some of these differences aren't a problem but I still can't open the ole object I generated
with POIFS in Word...

Cheers,
Helmut










-------- Original-Nachricht --------
> Datum: Thu, 24 Jul 2008 15:40:44 +0200
> Von: "Helmut Ziegler" <scruffytech@gmx.net>
> An: "POI Users List" <user@poi.apache.org>
> Betreff: Re: Can POIFS convert PDF to OLE

> Hi Nick,
> 
> thanks for your response!
> I didn't use POIFSViewer but I know (now) the structure of my PDF Ole
> Object. Unfortunately this isn't enough ...
> 
> Here is what I did:
> 
> First of all I created a Word2003 xml file with Word and imported a pdf
> file. The PDF is recognized as a package (not as a pdf file) as there wasn't
> a program to handle pdf files on that computer.
> These are the important parts:
> <w:docOleData>
> <w:binData w:name="oledata.mso">
> 0M8R4KGxGuEAAAAAAAAAAAAAAAAAAAAAPgADAP7/
> ...
> </w:binData></w:docOleData>
> 
> <o:OLEObject Type="Embed" ProgID="Package" ShapeID="_x0000_i1025"
> DrawAspect="Content" ObjectID="_1277043057"/>
> 
> In the word xml file the ole object is base64 encoded.
> I decoded it and wrote a binary file (OleObject.bin) that I inspected
> (first with 7-zip, later with POIFS)
> 
> The structure of OleObject.bin is the following
> + Root entry
> ++ _1277043057
> +++[3]ObjectInfo
> +++[1]Ole10Native
> +++[1]Ole
> +++[1]CompObj
> 
> Ole10Native represents my pdf with a custom header that word attached.
> To get to this content I had to:
> 1. Create a POIFSFilesSystem based on OleObject.bin
> 2. Get the Entry "_1277043057" and write it to the hard disk (as
> "_1277043057").
> 3. Strip the first 4 Bytes of "_1277043057"
> 4. Use the inflate algortithm to decompress it as
> "_1277043057_decompressed"
> 5. Create a POIFSFileSystem again based on the decompressed
> "_1277043057_decompressed")
> 6. Write the contents listed above to the hard disk. 
> ==>I could then open my PDF file. 
> 
> So far, so good. Now I tried it vice versa. After packaging the content
> again and tried to open the file in Word, Word complained that it can't open
> the file because
> "The server application, the source file, or the element wasn't found" 
> (this is only a translation)
> 
> The I was looking for the step that that fails.
> Steps 1 to 4 worked also in the other direction but creating
> "_1277043057_decompressed" seemed not to  work.
> When I compared the to original "_1277043057_decompressed" to the
> generated one there are many similarities (file size and most of the content). But
> in first part of the file original there is more information.
> I had a look at it in a text editor. The information is some kind of
> metadata:
> 1. The alphabet
> 2. The structure of the ole object. "R.o.o.t. .E.n.t.r.y .... O.l.e. ...
> C.o.m.p.O.b.j...."
> 3. The kind of ole object "P.a.c.k.a.g.e"
> 
> 
> Does anyone know how I get this information into my file?
> 
> Cheers,
> Helmut
> 
> P. S. The reverse enineering is based on this excellent article
> http://www.trustedsource.org/download/research_publications/CAlme_VBOct06.pdf
> 
> 
> 
> ----
> -------- Original-Nachricht --------
> > Datum: Thu, 24 Jul 2008 11:42:10 +0100 (BST)
> > Von: Nick Burch <nick@torchbox.com>
> > An: POI Users List <user@poi.apache.org>
> > Betreff: Re: Can POIFS convert PDF to OLE
> 
> > On Thu, 24 Jul 2008, Helmut Ziegler wrote:
> > > Actually the Word document should also carry other documents like
> other 
> > > word files.
> > 
> > I'd suggest dumping out the stream(s), and looking at them with things 
> > like org.apache.poi.poifs.dev.POIFSViewer
> > 
> > Start by seeing if you can change on bit of one file in the poifs
> stream, 
> > and have the change noticed. If that works, but adding a new poifs
> stream 
> > doesn't, then there are extra things in the poifs stream that need to be
> > set up. I think you're probably going to need to run diff quite a bit, 
> > across two files (one that works, one that doesn't) and see what's 
> > different
> > 
> > Nick
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> > For additional commands, e-mail: user-help@poi.apache.org
> 
> -- 
> Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten 
> Browser-Versionen downloaden: http://www.gmx.net/de/go/browser
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org

-- 
GMX Kostenlose Spiele: Einfach online spielen und SpaƟ haben mit Pastry Passion!
http://games.entertainment.gmx.net/de/entertainment/games/free/puzzle/6169196

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Mime
View raw message