poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Burch <apa...@gagravarr.org>
Subject Re: Invalid header for xls: 0x0010000000060409?
Date Tue, 25 Nov 2014 14:20:19 GMT
On Mon, 24 Nov 2014, Allison, Timothy B. wrote:
> I recently ran Tika against the ~1 million files in govdocs1.  Nearly 
> 91% (2,579/2,828) of the XLS exceptions via Tika 1.7 are the following. 
> Tika is detecting these as XLS and then the header exception is thrown.

You need to read that backwards to see the pattern, so the file starts 
with 0x090406

> Does this header ring any bells?  Old version of XLS, perhaps?  The 
> triggering files open in Excel and I think I see that they are "Excel 
> 4".

Sounds like one of the very old, pre-ole2 versions

Looking at the OpenOffice documentation, under section 2.2 and 2.3:
http://www.openoffice.org/sc/excelfileformat.pdf

That suggests that Excel 5 onwards (5, 95, 97 etc) used OLE2, so that'd 
mean it's Excel 1 through Excel 4

> I can't get the link to work, but one triggering file is 004444.xls.

If you can get that file out, and raise a JIRA, then we can look to add in 
magic to correctly detect/handle those files!

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Mime
View raw message