incubator-odf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Devin Han <>
Subject Re: Tika is waiting for ODFToolkit to improve ODF file format processing
Date Wed, 07 Dec 2011 08:04:40 GMT
Tika has been fixed this issue[1] in Tika 1.0[2].
But, we still need to keep our eyes on Tika and the memory optimized
streaming API for read-only and single pass.

Anyway, let's speed up the process of initial release.

BTW: Anyone volunteer to do some pre-work for the streaming API?


2011/10/24 Devin Han <>

> I saw this issue in Tika: OpenOffice parser: master footer text isn't
> extracted
> The current ODF parser of Tika doesn't touch the styles part and the
> embeded document, only meta and content. They are waiting for the first ODF
> Toolkit incubating release, then switch to a full featured parser much as
> they have for the POI powered ones.
> The first release is coming and we will have no code update before it. So,
> I suggest start the discussion that how to use ODF Toolkit to realize it
> based on the snapshot.
> This feature concerns ODFDOM and Simple ODF API. We have involved text
> extraction in the cookbook and demo, see:
> The work we need to do:
> (1) What' s the detail requirements of Tika?
> (2) Whether the exist features odf ODF Toolkit can cover the requirements
> of Tika?
> (3) How to use ODF Toolkit realize it?
> CC to Tika Dev list, in case, guys in this list are interested in this
> issue.
> --
> -Devin


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message