Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 3BF47200CA3 for ; Wed, 3 May 2017 08:00:13 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 3AA7F160BAC; Wed, 3 May 2017 06:00:13 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 5BB24160B9D for ; Wed, 3 May 2017 08:00:12 +0200 (CEST) Received: (qmail 8356 invoked by uid 500); 3 May 2017 06:00:11 -0000 Mailing-List: contact user-help@poi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "POI Users List" Delivered-To: mailing list user@poi.apache.org Received: (qmail 8345 invoked by uid 99); 3 May 2017 06:00:11 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 May 2017 06:00:11 +0000 Received: from mail-oi0-f50.google.com (mail-oi0-f50.google.com [209.85.218.50]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 0A68E1A05A0 for ; Wed, 3 May 2017 06:00:11 +0000 (UTC) Received: by mail-oi0-f50.google.com with SMTP id l18so21854856oig.2 for ; Tue, 02 May 2017 23:00:10 -0700 (PDT) X-Gm-Message-State: AN3rC/4c44XTVwyt9TvTLul7Y8jweldjxVWsB55yKCgPhe9sSKOu5vDF nSKLimN124cNOjhA6tRMslCxqtQFvA== X-Received: by 10.202.230.204 with SMTP id d195mr1735951oih.39.1493791210157; Tue, 02 May 2017 23:00:10 -0700 (PDT) MIME-Version: 1.0 Received: by 10.157.11.241 with HTTP; Tue, 2 May 2017 22:59:29 -0700 (PDT) In-Reply-To: References: From: "Javen O'Neal" Date: Tue, 2 May 2017 22:59:29 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Removing Macros from a document To: POI Users List Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable archived-at: Wed, 03 May 2017 06:00:13 -0000 Hey, PunKeel, this is great! If your software is based on POI and you'd like to upstream some of your changes to POI to make your library more straight forward, send us a pull request and we'll review it, give feedback, and commit it. I have barely dabbled in how VBA projects are saved in the OLE2 formats (VBAMacroReader class), but perhaps others have some ideas and a few free cycles to spare (being a volunteer project and all). Keep in mind that PPT files save macros in a different part of the OLE2 file structure than XLS and DOC. A reminder that these binary formats weren't simultaneously developed by the same software team at the same time. The following bugs might be of interest to you: https://bz.apache.org/bugzilla/buglist.cgi?bug_id=3D52949%2C59302%2C60273%2= C59830%2C59858%2C60158&list_id=3D159842 Feel free to continue this discussion over on dev@poi.apache.org, where they might be a better technical audience who could point out some of the POI internals. Most of the POI devs monitor both mailing lists, so which mailing list probably doesn't matter too much. Javen On Tue, May 2, 2017 at 6:19 PM, PunKeel wrote: > Hello, > > I've found out the solution for Excel files: removing the "ObProj" record= of > the Workbook. > > The code I'm using is available here: > https://gist.github.com/PunKeel/0e72ccde78cb0150383a9ced094c2bce > I don't think this is the cleanest way to achieve it, so I'm open to > suggestions. > > It also doesn't work for Word/PPT files. If I understand correctly (let's > hope I do), the > (Current User, PowerPoint Document) and (WordDocument, 0Table, 1Table) > streams > need to be edited by hand because Apache POI is lacking the APIs for thes= e > formats. > ~> How? Are they like "DocumentStreams"? May I use the "RecordFactory"? > > Am I right? I am more than open to suggestions/help, please! > > Best regards, > > > On 1 May 2017, at 6:15 PM, PunKeel wrote: > > Hello, > > I am currently building an open-source software to disarm Office files, > named DocBleach [1] > but I am stuck with some specificities the OLE2 format. > > First of all, I would like to thank you for the great library that is Apa= che > POI! > > When opening disarmed OLE2 files in Office Excel/Word on Windows 10 (have= n't > checked other versions), > an alert is displayed, depending on the editor: > > - Excel says that the file is corrupted and needs to be repaired. The err= or: > "Lost Visual Basic project". > ~> Once repaired, the Macro Viewer is unable to tell us the name of the > Macros > > - Words tells us that the file is unsafe because it contains Macros. > ~> The "Macro Viewer" is able to tell us the name of the Macros > > ---- > > As you know, OLE2 files are "file systems in a file". > In order to remove the Macros of a document, I remove the Macros > "directory". > > Sample log, for the record. (The process being the same for Word/Excel, I= 'll > only give one). > Relative code is available at [2] > Sample files, with their sanitised form (named "-free") > ~> https://www.mediafire.com/folder/yh122tgbyzadw/Sample_2017_May_1 > > ##### > > $ java -jar docbleach.jar -vv -in ../Goodware/macro.doc -out - > > macro-free.doc > [main] DEBUG xyz.docbleach.cli.Main - Log Level: TRACE > [main] TRACE xyz.docbleach.api.bleach.CompositeBleach - Using bleach: OLE= 2 > Bleach > [main] DEBUG xyz.docbleach.module.ole2.OLE2Bleach - Entries before: > [CompObj, 1Table, SummaryInformation, WordDocument, > DocumentSummaryInformation, Macros] > [main] DEBUG xyz.docbleach.module.ole2.OLE2Bleach - Root ClassID: > {00020906-0000-0000-C000-000000000046} > > [main] TRACE xyz.docbleach.module.ole2.OLE2Bleach - copyNodesRecursively: > SummaryInformation, parent: > org.apache.poi.poifs.filesystem.DirectoryNode@2e5c649 > [main] TRACE xyz.docbleach.module.ole2.OLE2Bleach - copyNodesRecursively: > DocumentSummaryInformation, parent: > org.apache.poi.poifs.filesystem.DirectoryNode@2e5c649 > [main] TRACE xyz.docbleach.module.ole2.OLE2Bleach - copyNodesRecursively: > WordDocument, parent: org.apache.poi.poifs.filesystem.DirectoryNode@2e5c6= 49 > [main] TRACE xyz.docbleach.module.ole2.OLE2Bleach - copyNodesRecursively: > 1Table, parent: org.apache.poi.poifs.filesystem.DirectoryNode@2e5c649 > > [main] DEBUG xyz.docbleach.module.ole2.OLE2Bleach - Entries after: [1Tabl= e, > SummaryInformation, WordDocument, DocumentSummaryInformation] > > ##### > > The CompObj and Macros entries are removed (not copied), so the Macros > *can't* work. > > I've been trying a lot of things, especially with Excel files (they only > contain a Workbook, > SummaryInformation and DocumentSummaryInformation) and I've found out the > Workbook > was in fault: the two summaries did not contain the "macro reference", an= d I > recreate the file > from scratch so it has to be in an entry. > > If I understand correctly, there are "entries" in the Workbook/WordDocume= nt > holding the Macros. > I found "stwUser" in the Word documentation [3], and I assume that I need= to > remove it, but couldn't > find an unified API to achieve it for Word/Excel/PowerPoint documents. > > My question: is there an "easy" API to interact with these entries, remov= ing > parts of it? > If so, could you please give me some leads/examples on how to do it? > If not, do you have tips on how to achieve something similar? > > I could iterate over the Workbook/Document to copy it over manually, with= out > the Macros=E2=80=A6 > but if the API allowing it is not unified, I would have to do it for > XLS/Word/PPT files, right? > Doesn't seem like the easy path! :-( > > Thanks in advance! > > - PunKeel > > [1]: https://github.com/docbleach/DocBleach > [2]: > https://github.com/docbleach/DocBleach/blob/master/module/module-office/s= rc/main/java/xyz/docbleach/module/ole2/OLE2Bleach.java > [3]: https://msdn.microsoft.com/en-us/library/dd923194(v=3Doffice.12).asp= x > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscribe@poi.apache.org > For additional commands, e-mail: user-help@poi.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscribe@poi.apache.org For additional commands, e-mail: user-help@poi.apache.org