any23-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>
Subject Observation with office-scraper plugin
Date Tue, 02 Jul 2013 19:47:32 GMT
Hi,
For the first time today I have a use case of the office-scraper plugin [0].
The command line tools come in pretty handy here and I made the following
observation.
If you are working with xsl (older formats) or xlsx (newer 2007-2010)
formats they need to be ***originally*** written in Microsoft Excel. I can
only assume that this is because the mimetype MD is written and maintained
based on the original editor.
For example I created two excel documents on Libra Office (ouch) as I am
using Ubuntu... I save tho my desktop and use

law@CEE279Law3-Linux:~/Desktop$ any23 mimes file:///home/law/spec_table.xls
Display all 190 possibilities? (y or n)
Linux:~/Desktop$ any23 mimes file:///home/law/Desktop/spec_table.xls

------------------------------------------------------------------------
Apache Any23 :: mimes
------------------------------------------------------------------------

application/x-tika-msoffice

------------------------------------------------------------------------
Apache Any23 SUCCESS
Total time: 0s
Finished at: Tue Jul 02 12:37:20 PDT 2013
Final Memory: 25M/479M
------------------------------------------------------------------------
Linux:~/Desktop$ any23 mimes file:///home/law/Desktop/spec_table.xlsx

------------------------------------------------------------------------
Apache Any23 :: mimes
------------------------------------------------------------------------

application/x-tika-ooxml

------------------------------------------------------------------------
Apache Any23 SUCCESS
Total time: 0s
Finished at: Tue Jul 02 12:37:29 PDT 2013
Final Memory: 25M/479M
------------------------------------------------------------------------

When I do

Linux:~/Desktop$ any23 verify ~/.any23/plugins
------------------------------------------------------------------------
Apache Any23 :: verify
------------------------------------------------------------------------

Plugin author    : <unknown>
Plugin factory   : class
org.apache.any23.plugin.officescraper.ExcelExtractorFactory
Plugin mime-types: application/vnd.ms-excel;q=0.1 application/msexcel;q=0.1
application/x-msexcel;q=0.1 application/x-ms-excel;q=0.1
------------------------------------------------------------------------

The plugin will ***only*** work with document formats
application/vnd.ms-excel;q=0.1 application/msexcel;q=0.1
application/x-msexcel;q=0.1 application/x-ms-excel;q=0.1

So I am running between the library and my office punching in trivial
spreadsheets to achieve what I want to do... the joys.

Thanks
Lewis

[0] *http://s.apache.org/UaG*

-- 
*Lewis*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message