incubator-cvs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Incubator Wiki] Update of "TikaProposal" by MarkHarwood
Date Mon, 05 Mar 2007 14:22:28 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Incubator Wiki" for change notification.

The following page has been changed by MarkHarwood:

   * [ Apache Lucene] - The analysis part of Lucene contains
code that might overlap with some of the potential Tika functionality. There migth also be
some overlap regarding the Document model in Lucene.
   * [ Lucene Nutch] - The Nutch project already contains a
parser framework that does many of the things that Tika is designed to do.
   * [ Apache Jackrabbit] - The Jackrabbit project contains a
text extraction component that also implements a subset of the proposed Tika features.
+  *  [ Apache UIMA] - The UIMA project provides a framework
and pluggable tools for analyzing text content and extracting information. Example tools include
language identification, sentence boundary detection and "entity extraction" - finding  references
to people, places and organisations. TIKA could be used by UIMA to parse text but TIKA should
be careful not to duplicate the subsequent text analysis features UIMA offers.
   * ''TODO: Other projects? Solr? The Droids lab?''
  === A Excessive Fascination with the Apache Brand ===

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message