Return-Path: Delivered-To: apmail-poi-user-archive@www.apache.org Received: (qmail 27877 invoked from network); 24 Jul 2008 18:10:20 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 24 Jul 2008 18:10:20 -0000 Received: (qmail 54051 invoked by uid 500); 24 Jul 2008 18:10:19 -0000 Delivered-To: apmail-poi-user-archive@poi.apache.org Received: (qmail 54037 invoked by uid 500); 24 Jul 2008 18:10:19 -0000 Mailing-List: contact user-help@poi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "POI Users List" Delivered-To: mailing list user@poi.apache.org Received: (qmail 54026 invoked by uid 99); 24 Jul 2008 18:10:18 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Jul 2008 11:10:18 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of scruffytech@gmx.net designates 213.165.64.20 as permitted sender) Received: from [213.165.64.20] (HELO mail.gmx.net) (213.165.64.20) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 24 Jul 2008 18:09:22 +0000 Received: (qmail 8741 invoked by uid 0); 24 Jul 2008 18:08:46 -0000 Received: from 194.24.207.10 by www122.gmx.net with HTTP; Thu, 24 Jul 2008 20:08:46 +0200 (CEST) Content-Type: text/plain; charset="iso-8859-1" Date: Thu, 24 Jul 2008 20:08:46 +0200 From: "Helmut Ziegler" In-Reply-To: <20080724134044.323730@gmx.net> Message-ID: <20080724180846.155620@gmx.net> MIME-Version: 1.0 References: <20080721125426.99600@gmx.net> <20080723184300.17730@gmx.net> <20080724062226.297970@gmx.net> <20080724134044.323730@gmx.net> Subject: Re: Can POIFS convert PDF to OLE To: "POI Users List" X-Authenticated: #26667709 X-Flags: 0001 X-Mailer: WWW-Mail 6100 (Global Message Exchange) X-Priority: 3 X-Provags-ID: V01U2FsdGVkX1/7u0ZeZVn7p+GdnbSdBhGF7siBxJ5ywjHmnSKXsy krHLuwOiP0OBK+jeLVZhZC71nQiuxOJW/8Nw== Content-Transfer-Encoding: 8bit X-GMX-UID: ZoPuIqJKaHItQMe4sCQlPJBiamdhZAQY X-Virus-Checked: Checked by ClamAV on apache.org Hi, I didn't make a progress but know a bit more about the "upper part" (s. below) > But in the first part of the original file there is more information. > I had a look at it in a text editor. The information is some kind of > metadata: > 1. The alphabet > 2. The structure of the ole object. "R.o.o.t. .E.n.t.r.y .... O.l.e. ... > C.o.m.p.O.b.j...." > 3. The kind of ole object "P.a.c.k.a.g.e" The compound object "_1277043057_decompressed_generated" that I generated from [3]ObjectInfo [1]Ole10Native [1]Ole [1]CompObj has a slightly different structure than the original "_1277043057_decompressed_original". The "metadata" is just in another place. I think it's the directory structure of the compound file and the other objects [3]OleObjectInfo, [1]Ole, [1]CompObj. In the generated file the structure is similar to this: 1. Part (512 Byte): Header (probably addresses to the other parts, and the rest padded up) 2. Part (512 Byte): The alphabet in this form "A...B...C..." 3. Part (512 Byte): The directory structure "R.o.o.t. .E.n.t.r.y....O.b.j.I.n.f.o." (without Ole10Native!) 4. Part (512 Byte): Unkown block (maybe the first part signals the end of the directory structure, the rest is padded up) 5. Part (512 Byte): This seems to be the first content block, as there is the content of [1]Ole, [3]ObjectInfo and [1]CompObj (every content part is padded up with "00"). 6. Part (512 Byte): Here again comes a directory structure, but now only with "O.l.e.1.0.N.a.t.i.v.e" 7. Part (512 Byte): Unknown block (again it may signal the end of the file structure) 8. Part (Rest of file): The content for [1]Ole10Native ==> the pdf The structure of the file that was generated using POIFS: 1. Part (512 Byte): Header (probably addresses to the other parts, and the rest padded up) 2. Part (.... Byte): The content for [1]Ole10Native ==> the pdf 3. Part (512 Byte): The directory structure "R.o.o.t. .E.n.t.r.y....O.b.j.I.n.f.o." (without Ole10Native) 4. Part (512 Byte): The directory structure for "O.l.e.1.0.N.a.t.i.v.e" 5. Part (512 Byte): This is the content block for the content parts [1]Ole, [3]ObjectInfo and [1]CompObj (every content part is padded up with "FF", in contrary to the original file). 6. Part: Unknown part (mostly padded up with FF) 7. Part: The alphabet in this form "A...B...C..." 8. Part: Unknown part (seems to be part 7 of the original) So the main differences are: a) the divided directory structure in the original (word generated) file b) ole10native comes before all other objects and even the directory structure in the POIFS generated file c) content parts are normally padded up with 00 in the original file and FF in the POIFS generated Maybe some of these differences aren't a problem but I still can't open the ole object I generated with POIFS in Word... Cheers, Helmut -------- Original-Nachricht -------- > Datum: Thu, 24 Jul 2008 15:40:44 +0200 > Von: "Helmut Ziegler" > An: "POI Users List" > Betreff: Re: Can POIFS convert PDF to OLE > Hi Nick, > > thanks for your response! > I didn't use POIFSViewer but I know (now) the structure of my PDF Ole > Object. Unfortunately this isn't enough ... > > Here is what I did: > > First of all I created a Word2003 xml file with Word and imported a pdf > file. The PDF is recognized as a package (not as a pdf file) as there wasn't > a program to handle pdf files on that computer. > These are the important parts: > > > 0M8R4KGxGuEAAAAAAAAAAAAAAAAAAAAAPgADAP7/ > ... > > > DrawAspect="Content" ObjectID="_1277043057"/> > > In the word xml file the ole object is base64 encoded. > I decoded it and wrote a binary file (OleObject.bin) that I inspected > (first with 7-zip, later with POIFS) > > The structure of OleObject.bin is the following > + Root entry > ++ _1277043057 > +++[3]ObjectInfo > +++[1]Ole10Native > +++[1]Ole > +++[1]CompObj > > Ole10Native represents my pdf with a custom header that word attached. > To get to this content I had to: > 1. Create a POIFSFilesSystem based on OleObject.bin > 2. Get the Entry "_1277043057" and write it to the hard disk (as > "_1277043057"). > 3. Strip the first 4 Bytes of "_1277043057" > 4. Use the inflate algortithm to decompress it as > "_1277043057_decompressed" > 5. Create a POIFSFileSystem again based on the decompressed > "_1277043057_decompressed") > 6. Write the contents listed above to the hard disk. > ==>I could then open my PDF file. > > So far, so good. Now I tried it vice versa. After packaging the content > again and tried to open the file in Word, Word complained that it can't open > the file because > "The server application, the source file, or the element wasn't found" > (this is only a translation) > > The I was looking for the step that that fails. > Steps 1 to 4 worked also in the other direction but creating > "_1277043057_decompressed" seemed not to work. > When I compared the to original "_1277043057_decompressed" to the > generated one there are many similarities (file size and most of the content). But > in first part of the file original there is more information. > I had a look at it in a text editor. The information is some kind of > metadata: > 1. The alphabet > 2. The structure of the ole object. "R.o.o.t. .E.n.t.r.y .... O.l.e. ... > C.o.m.p.O.b.j...." > 3. The kind of ole object "P.a.c.k.a.g.e" > > > Does anyone know how I get this information into my file? > > Cheers, > Helmut > > P. S. The reverse enineering is based on this excellent article > http://www.trustedsource.org/download/research_publications/CAlme_VBOct06.pdf > > > > ---- > -------- Original-Nachricht -------- > > Datum: Thu, 24 Jul 2008 11:42:10 +0100 (BST) > > Von: Nick Burch > > An: POI Users List > > Betreff: Re: Can POIFS convert PDF to OLE > > > On Thu, 24 Jul 2008, Helmut Ziegler wrote: > > > Actually the Word document should also carry other documents like > other > > > word files. > > > > I'd suggest dumping out the stream(s), and looking at them with things > > like org.apache.poi.poifs.dev.POIFSViewer > > > > Start by seeing if you can change on bit of one file in the poifs > stream, > > and have the change noticed. If that works, but adding a new poifs > stream > > doesn't, then there are extra things in the poifs stream that need to be > > set up. I think you're probably going to need to run diff quite a bit, > > across two files (one that works, one that doesn't) and see what's > > different > > > > Nick > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: user-unsubscribe@poi.apache.org > > For additional commands, e-mail: user-help@poi.apache.org > > -- > Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten > Browser-Versionen downloaden: http://www.gmx.net/de/go/browser > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscribe@poi.apache.org > For additional commands, e-mail: user-help@poi.apache.org -- GMX Kostenlose Spiele: Einfach online spielen und Spa� haben mit Pastry Passion! http://games.entertainment.gmx.net/de/entertainment/games/free/puzzle/6169196 --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscribe@poi.apache.org For additional commands, e-mail: user-help@poi.apache.org