Return-Path: X-Original-To: apmail-poi-dev-archive@www.apache.org Delivered-To: apmail-poi-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 03473DBE6 for ; Wed, 28 Nov 2012 08:48:07 +0000 (UTC) Received: (qmail 65020 invoked by uid 500); 28 Nov 2012 08:48:06 -0000 Delivered-To: apmail-poi-dev-archive@poi.apache.org Received: (qmail 64815 invoked by uid 500); 28 Nov 2012 08:48:03 -0000 Mailing-List: contact dev-help@poi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "POI Developers List" Delivered-To: mailing list dev@poi.apache.org Received: (qmail 64679 invoked by uid 99); 28 Nov 2012 08:47:59 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Nov 2012 08:47:59 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.115] (HELO eir.zones.apache.org) (140.211.11.115) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Nov 2012 08:47:57 +0000 Received: by eir.zones.apache.org (Postfix, from userid 80) id 6FBF64C40; Wed, 28 Nov 2012 08:47:36 +0000 (UTC) From: bugzilla@apache.org To: dev@poi.apache.org Subject: [Bug 54213] Exception parsing XLS embedded in PPT file Date: Wed, 28 Nov 2012 08:47:34 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: POI X-Bugzilla-Component: HSLF X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: yegor@dinom.ru X-Bugzilla-Status: NEW X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: dev@poi.apache.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://issues.apache.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org https://issues.apache.org/bugzilla/show_bug.cgi?id=54213 --- Comment #2 from Yegor Kozlov --- The raw object is a MSGraph.Chart, not a Excel workbook. Don't be misled by the stream name "Workbook" - it is just a format convention. The MSGraph.Chart format is a derivative from BIFF8. The content stream consists of records but the structure and length of the records *CAN* be totally different from their analogues in the binary .xls format. For example, POI-HSSF parser detects record with sid=0x3d as WindowOneRecord and expects that such a record consists of nine shorts and has size of 18 bytes (9 fields of 2 bytes each) . the MSGraph.Chart format is different: depending on the position of WindowOneRecord in the stream it can be either 18 bytes (nine two-byte fields) or 10 bytes (five two-byte fields), see section 2.4.104 in [MS-OGRAPH].pdf I found similar discrepancies for SelectionRecord (0x001D) and LinkedDataRecord (0x1051). All this means that using HSSF to parse MSGraph.Chart is not quite correct. It is a special case you need a special parser to handle it. What information do you need to extract from embedded charts? Series text and data labels? What else ? I'm thinking of a special record factory and a even-driven parser that will read only specific bits of data. We may need to extend current API to support it. Regards, Yegor -- You are receiving this mail because: You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org For additional commands, e-mail: dev-help@poi.apache.org