incubator-odf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Devin Han <devin...@apache.org>
Subject Re: Tika is waiting for ODFToolkit to improve ODF file format processing
Date Wed, 07 Dec 2011 08:04:40 GMT
Tika has been fixed this issue[1] in Tika 1.0[2].
But, we still need to keep our eyes on Tika and the memory optimized
streaming API for read-only and single pass.

Anyway, let's speed up the process of initial release.

BTW: Anyone volunteer to do some pre-work for the streaming API?


[1] https://issues.apache.org/jira/browse/TIKA-736
[2] http://tika.apache.org/1.0/index.html

2011/10/24 Devin Han <devinhan@apache.org>

> I saw this issue in Tika: OpenOffice parser: master footer text isn't
> extracted https://issues.apache.org/jira/browse/TIKA-736
>
> The current ODF parser of Tika doesn't touch the styles part and the
> embeded document, only meta and content. They are waiting for the first ODF
> Toolkit incubating release, then switch to a full featured parser much as
> they have for the POI powered ones.
>
> The first release is coming and we will have no code update before it. So,
> I suggest start the discussion that how to use ODF Toolkit to realize it
> based on the snapshot.
>
> This feature concerns ODFDOM and Simple ODF API. We have involved text
> extraction in the cookbook and demo, see:
>
>
> http://incubator.apache.org/odftoolkit/simple/document/cookbook/TextExtractor.html
> http://incubator.apache.org/odftoolkit/simple/demo/demo2.html
>
> The work we need to do:
> (1) What' s the detail requirements of Tika?
> (2) Whether the exist features odf ODF Toolkit can cover the requirements
> of Tika?
> (3) How to use ODF Toolkit realize it?
>
> CC to Tika Dev list, in case, guys in this list are interested in this
> issue.
> --
> -Devin
>



-- 
-Devin

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message