poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject [Bug 54213] Exception parsing XLS embedded in PPT file
Date Wed, 28 Nov 2012 08:47:34 GMT
https://issues.apache.org/bugzilla/show_bug.cgi?id=54213

--- Comment #2 from Yegor Kozlov <yegor@dinom.ru> ---
The raw object is a MSGraph.Chart, not a Excel workbook. Don't be misled by the
stream name "Workbook" - it is just a format convention.  

The MSGraph.Chart format is a derivative from BIFF8. The content stream
consists of records but the structure and length of the records *CAN* be
totally different from their analogues in the binary .xls format.   

For example, POI-HSSF parser detects record with sid=0x3d as WindowOneRecord
and expects that such a record consists of nine shorts and has size of 18 bytes
(9 fields of 2 bytes each) .  

the MSGraph.Chart format is different: depending on the position of
WindowOneRecord  in the stream it can be either 18 bytes (nine two-byte fields)
or 10 bytes (five two-byte fields), see section 2.4.104 in [MS-OGRAPH].pdf

I found similar discrepancies for SelectionRecord (0x001D) and LinkedDataRecord
(0x1051).  

All this means that using HSSF to parse MSGraph.Chart is not quite correct. It
is a special case you need a special parser to handle it. 

What information do you need to extract from embedded charts? Series text and
data labels? What else ? 

I'm thinking of a special record factory and a even-driven parser that will
read only specific bits of data. We may need to extend current API to support
it.

Regards,
Yegor

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Mime
View raw message