poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yury Batrakov" <batra...@gmail.com>
Subject Re: Can POIFS convert PDF to OLE
Date Thu, 24 Jul 2008 14:05:18 GMT
Hi Helmut,

As far as I remember, this is OLE header. I decoded OLE embedds from
RTF and they were looking similar to yours. Microsoft RTF spec says:
"When the object is an OLE embedded or linked object, the data part of
the object is the structure produced by the OLESaveToStream function".
I tried to reverse-engineer the format and read wine's source for
OLESaveToStream and OLELoadFromStream, but was defeated soon as this
feature wasn't mandatory in our product.

I hope this will help you somehow, good luck and, please, keep
notifying this maillist in case of any progress.


On 7/24/08, Helmut Ziegler <scruffytech@gmx.net> wrote:
> Hi Nick,
>
> thanks for your response!
> I didn't use POIFSViewer but I know (now) the structure of my PDF Ole
> Object. Unfortunately this isn't enough ...
>
> Here is what I did:
>
> First of all I created a Word2003 xml file with Word and imported a pdf
> file. The PDF is recognized as a package (not as a pdf file) as there wasn't
> a program to handle pdf files on that computer.
> These are the important parts:
> <w:docOleData>
> <w:binData w:name="oledata.mso">
> 0M8R4KGxGuEAAAAAAAAAAAAAAAAAAAAAPgADAP7/
> ...
> </w:binData></w:docOleData>
>
> <o:OLEObject Type="Embed" ProgID="Package" ShapeID="_x0000_i1025"
> DrawAspect="Content" ObjectID="_1277043057"/>
>
> In the word xml file the ole object is base64 encoded.
> I decoded it and wrote a binary file (OleObject.bin) that I inspected (first
> with 7-zip, later with POIFS)
>
> The structure of OleObject.bin is the following
> + Root entry
> ++ _1277043057
> +++[3]OleObjectInfo
> +++[1]Ole10Native
> +++[1]Ole
> +++[1]CompObj
>
> Ole10Native represents my pdf with a custom header that word attached.
> To get to this content I had to:
> 1. Create a POIFSFilesSystem based on OleObject.bin
> 2. Get the Entry "_1277043057" and write it to the hard disk (as
> "_1277043057").
> 3. Strip the first 4 Bytes of "_1277043057"
> 4. Use the inflate algortithm to decompress it as "_1277043057_decompressed"
> 5. Create a POIFSFileSystem again based on the decompressed
> "_1277043057_decompressed")
> 6. Write the contents listed above to the hard disk.
> ==>I could then open my PDF file.
>
> So far, so good. Now I tried it vice versa. After packaging the content
> again and tried to open the file in Word, Word complained that it can't open
> the file because
> "The server application, the source file, or the element wasn't found"
> (this is only a translation)
>
> The I was looking for the step that that fails.
> Steps 1 to 4 worked also in the other direction but creating
> "_1277043057_decompressed" seemed not to  work.
> When I compared the to original "_1277043057_decompressed" to the generated
> one there are many similarities (file size and most of the content). But in
> first part of the file original there is more information.
> I had a look at it in a text editor. The information is some kind of
> metadata:
> 1. The alphabet
> 2. The structure of the ole object. "R.o.o.t. .E.n.t.r.y .... O.l.e. ...
> C.o.m.p.O.b.j...."
> 3. The kind of ole object "P.a.c.k.a.g.e"
>
>
> Does anyone know how I get this information into my file?
>
> Cheers,
> Helmut
>
> P. S. The reverse enineering is based on this excellent article
> http://www.trustedsource.org/download/research_publications/CAlme_VBOct06.pdf
>
>
>
> ----
> -------- Original-Nachricht --------
>> Datum: Thu, 24 Jul 2008 11:42:10 +0100 (BST)
>> Von: Nick Burch <nick@torchbox.com>
>> An: POI Users List <user@poi.apache.org>
>> Betreff: Re: Can POIFS convert PDF to OLE
>
>> On Thu, 24 Jul 2008, Helmut Ziegler wrote:
>> > Actually the Word document should also carry other documents like other
>> > word files.
>>
>> I'd suggest dumping out the stream(s), and looking at them with things
>> like org.apache.poi.poifs.dev.POIFSViewer
>>
>> Start by seeing if you can change on bit of one file in the poifs stream,
>> and have the change noticed. If that works, but adding a new poifs stream
>> doesn't, then there are extra things in the poifs stream that need to be
>> set up. I think you're probably going to need to run diff quite a bit,
>> across two files (one that works, one that doesn't) and see what's
>> different
>>
>> Nick
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>> For additional commands, e-mail: user-help@poi.apache.org
>
> --
> Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten
> Browser-Versionen downloaden: http://www.gmx.net/de/go/browser
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Mime
View raw message