Return-Path: X-Original-To: apmail-poi-user-archive@www.apache.org Delivered-To: apmail-poi-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 56DF110668 for ; Mon, 10 Jun 2013 12:22:15 +0000 (UTC) Received: (qmail 66186 invoked by uid 500); 10 Jun 2013 12:22:14 -0000 Delivered-To: apmail-poi-user-archive@poi.apache.org Received: (qmail 66039 invoked by uid 500); 10 Jun 2013 12:22:11 -0000 Mailing-List: contact user-help@poi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "POI Users List" Delivered-To: mailing list user@poi.apache.org Received: (qmail 66031 invoked by uid 99); 10 Jun 2013 12:22:10 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Jun 2013 12:22:10 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of cbamford@mimecast.com designates 195.130.217.112 as permitted sender) Received: from [195.130.217.112] (HELO service-alpha-uk.mimecast.com) (195.130.217.112) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Jun 2013 12:22:05 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mimecast.com; s=20130419; t=1370866902; bh=0iAWUlk/KGHVykl6ImcDOAKwS7gmNuoTPM25nGt/F34=; h=From:To:Subject:Date:Message-ID:References:In-Reply-To:Content-ID:MIME-Version:Content-Type; b=ReZ5CPsqjj6AWNTRKxFiNZWZnSULbDnEr1/M5dXfUbr04ttp9odJRYWzTVnJI3n63v6JS+fEFfa6nTujLD7l/llrv1gz2jneE0Fp0tNNltKctJkIkxuclQDSE5H+w5Qc0Zhw22EqGMJ1rK1ghUm/vfPn3dr3dxceoy1IHICVods= Received: from remote.mimecast.com (146.101.202.133 [146.101.202.133]) (Using TLS) by uk-sl-a.uk.mimecast.lan; Mon, 10 Jun 2013 13:21:40 +0100 Received: from MC-LON-EXCH03.mcsltd.internal ([fe80::3879:e7a7:5e3d:3699]) by MC-LON-EXCH03.mcsltd.internal ([fe80::3879:e7a7:5e3d:3699%15]) with mapi id 14.02.0342.003; Mon, 10 Jun 2013 13:21:40 +0100 From: Chris Bamford To: POI Users List Subject: Re: Extracting embedded files from HWPF docs Thread-Topic: Extracting embedded files from HWPF docs Thread-Index: AQHOY3sDjf13rPt9P06YzSCYX9djBJkqKbqAgAADyICABGU+gIAAP6OA Date: Mon, 10 Jun 2013 12:21:39 +0000 Message-ID: References: <1363741413002-5712398.post@n5.nabble.com> <281B2E19-403E-4A2E-AC9B-E8508C8D30F5@mimecast.com> <5099E059-37D9-4220-9007-29C6657D17B5@mimecast.com> In-Reply-To: <5099E059-37D9-4220-9007-29C6657D17B5@mimecast.com> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [205.217.25.186] Content-ID: <0F46FC4934EF094EAA27BFC8D511983B@mimecast.com> MIME-Version: 1.0 X-MC-Unique: 5907b4ba-f295-4652-8610-79fc7d2d76db-1 Content-Type: multipart/alternative; boundary="MCBoundary=_11306101321410031" X-Virus-Checked: Checked by ClamAV on apache.org --MCBoundary=_11306101321410031 Content-Type: text/plain; charset=WINDOWS-1252 Content-Transfer-Encoding: quoted-printable Hi again Nick, This problem appears to be Mac-specific, I have had more luck with a .doc f= ile created natively in Windows :-) Now POIFSLister shows the ObjectPool and the item in it: Root Entry - SummaryInformation <(0x05)SummaryInformation> [412 / 0x19c] DocumentSummaryInformation <(0x05)DocumentSummaryInformation> [280 / 0x11= 8] WordDocument [4142 / 0x102e] 1Table [2087 / 0x827] ObjectPool - _1432368106 - CompObj <(0x01)CompObj> [76 / 0x4c] ObjInfo <(0x03)ObjInfo> [6 / 0x6] Ole10Native <(0x01)Ole10Native> [568849 / 0x8ae11] EPRINT <(0x03)EPRINT> [5000 / 0x1388] CompObj <(0x01)CompObj> [113 / 0x71] Data [4096 / 0x1000] Please can you point me to any resources which could help me to save the em= bedded file to another file (i.e. read all the bytes and save them somewher= e)? Thanks, - Chris On 10 Jun 2013, at 09:33, Chris Bamford wrote: > Hi Nick, >=20 > I created a .doc file with an embedded MP3 (that is, I dragged an MP3 fil= e from Finder and dropped it into the document whereupon Word displayed a s= mall image of a loudspeaker - I took this as a positive sign!). > I then added some text for good measure and saved it, taking care to save= it as "Word 97 - 2004". > Then I ran POIFSLister -sizes on it and got: >=20 > Root Entry - > SummaryInformation <(0x05)SummaryInformation> [4096 / 0x1000] > DocumentSummaryInformation <(0x05)DocumentSummaryInformation> [4096 / 0x= 1000] > WordDocument [9152 / 0x23c0] > 1Table [7280 / 0x1c70] > CompObj <(0x01)CompObj> [96 / 0x60] >=20 > Looking closer in the debugger, I discovered that none of the entries sho= wn are of type DirectoryNode, so I cannot even start the process of finding= / extracting the MP3. > Any ideas what I might be doing wrong? > Thanks, >=20 > - Chris >=20 >=20 > Thanks Nick, must have missed that. Will check it out. > Chris > On 7 Jun 2013, at 14:12, Nick Burch wrote: >> On Fri, 7 Jun 2013, Chris Bamford wrote: >>> Is there a way to extract files embedded into Word docs (.doc, not .doc= x), using the HWPF package? >>=20 >> Does the information on http://poi.apache.org/poifs/embeded.html not cov= er what you need? >>=20 >> Nick >=20 >=20 >=20 >=20 > On 7 Jun 2013, at 14:26, Chris Bamford wrote: >=20 > Thanks Nick, must have missed that. Will check it out. >=20 > Chris >=20 > On 7 Jun 2013, at 14:12, Nick Burch wrote: >=20 >> On Fri, 7 Jun 2013, Chris Bamford wrote: >>> Is there a way to extract files embedded into Word docs (.doc, not .doc= x), using the HWPF package? >>=20 >> Does the information on http://poi.apache.org/poifs/embeded.html not cov= er what you need? >>=20 >> Nick >=20 >=20 > Chris Bamford > Senior Developer >=20 > CityPoint,=20 > One Ropemaker Street,=20 > London,=20 > EC2Y 9AW. >=20 > mobile +44 7860 405292 > tel: +44 (0) 207 847 8700 > web www.mimecast.com >=20 >=20 > The information contained in this communication from cbamford@mimecast.co= m is confidential and may be legally privileged. It is intended solely for = use by user@poi.apache.org and others authorized to receive it. If you are = not user@poi.apache.org you are hereby notified that any disclosure, copyin= g, distribution or taking action in reliance of the contents of this inform= ation is strictly prohibited and may be unlawful. >=20 >=20 > Mimecast Ltd. is a company registered in England and Wales with the compa= ny number 4698693 VAT No. GB 123 4197 34 > Registered Office: CityPoint, One Ropemaker Street, Moorgate, London, EC2= Y 9AW Email Address: info@mimecast.com >=20 > This email message has been scanned for viruses by Mimecast. > Mimecast delivers a complete managed email solution from a single web bas= ed platform. > For more information please visit http://www.mimecast.com --MCBoundary=_11306101321410031--