incubator-cvs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ju...@apache.org
Subject svn commit: r705400 - in /incubator/public/trunk: site-author/projects/tika.xml site-publish/projects/tika.html
Date Thu, 16 Oct 2008 23:19:51 GMT
Author: jukka
Date: Thu Oct 16 16:19:50 2008
New Revision: 705400

URL: http://svn.apache.org/viewvc?rev=705400&view=rev
Log:
Updated Tika status page

Modified:
    incubator/public/trunk/site-author/projects/tika.xml
    incubator/public/trunk/site-publish/projects/tika.html

Modified: incubator/public/trunk/site-author/projects/tika.xml
URL: http://svn.apache.org/viewvc/incubator/public/trunk/site-author/projects/tika.xml?rev=705400&r1=705399&r2=705400&view=diff
==============================================================================
--- incubator/public/trunk/site-author/projects/tika.xml (original)
+++ incubator/public/trunk/site-author/projects/tika.xml Thu Oct 16 16:19:50 2008
@@ -22,6 +22,10 @@
     <section id="News">
       <title>News</title>
       <ul>
+        <li><strong>2008-10-01:</strong> Dave Meikle added as committer</li>
+        <li><strong>2008-04-07:</strong> Niall Pemberton added as committer</li>
+        <li><strong>2007-12-27:</strong> Apache Tika 0.1-incubating released</li>
+        <li><strong>2007-10-02:</strong> Keith Bennett added as committer</li>
         <li><strong>2007-03-22:</strong> Apache Tika begins incubation</li>
       </ul>
     </section>
@@ -58,12 +62,12 @@
         <tr>
           <td>Bug tracking</td>
           <td>jira</td>
-          <td><a href="http://issues.apache.org/jira/browse/TIKA">TIKA</a></td>
+          <td><a href="https://issues.apache.org/jira/browse/TIKA">TIKA</a></td>
         </tr>
         <tr>
           <td>Source code</td>
           <td>svn</td>
-          <td><a href="http://svn.apache.org/repos/asf/incubator/tika/">http://svn.apache.org/repos/asf/incubator/tika/</a></td>
+          <td><a href="https://svn.apache.org/repos/asf/incubator/tika/">https://svn.apache.org/repos/asf/incubator/tika/</a></td>
         </tr>
         <tr>
           <td>Sponsor</td>
@@ -87,7 +91,7 @@
         </tr>
         <tr>
           <td>Committers</td>
-          <td></td>
+          <td>ridabenjelloun</td>
           <td>Rida Benjelloun</td>
         </tr>
         <tr>
@@ -115,10 +119,289 @@
           <td>kbennett</td>
           <td>Keith Bennett</td>
         </tr>
+        <tr>
+          <td></td>
+          <td>niallp</td>
+          <td>Niall Pemberton</td>
+        </tr>
+        <tr>
+          <td></td>
+          <td>dmeikle</td>
+          <td>Dave Meikle</td>
+        </tr>
       </table>
     </section>
     <section id="Incubation+status+reports">
       <title>Incubation status reports</title>
+      <section id="October 2008">
+        <title>October 2008</title>
+        <p>
+          Apache Tika is a toolkit for detecting and extracting metadata and
+          structured text content from various documents using existing parser
+          libraries. Tika entered incubation on March 22nd, 2007.
+        </p>
+        <p><strong>Community</strong></p>
+        <ul>
+          <li>Dave Meikle was just voted in as a new committer.</li>
+          <li>Paolo Mottadelli will present Tika at ApacheCon US.</li>
+        </ul>
+        <p><strong>Development</strong></p>
+        <ul>
+          <li>Tika 0.2 should be released soon.</li>
+          <li>Usage documentation has been added to the website.</li>
+        </ul>
+        <p><strong>Issues before graduation</strong></p>
+        <ul>
+          <li>
+            The current plan is to graduate as a Lucene subproject, which
+            could happen soon as the incubation criteria seem to be met.
+          </li>
+        </ul>
+      </section>
+      <section id="July 2008">
+        <title>July 2008</title>
+        <p>
+          Apache Tika is a toolkit for detecting and extracting metadata and
+          structured text content from various documents using existing parser
+          libraries. Tika entered incubation on March 22nd, 2007.
+        </p>
+        <p><strong>Community</strong></p>
+        <ul>
+          <li>
+            Tika community remains relatively small, with just a handful of
+            active members
+          </li>
+        </ul>
+        <p><strong>Development</strong></p>
+        <ul>
+          <li>
+            Work towards Tika 0.2 continues, Chris Mattman has volunteered
+            to be the release manager
+          </li>
+        </ul>
+        <p><strong>Issues before graduation</strong></p>
+        <ul>
+          <li>
+            Increase the size and diversity of the community (or graduate
+            into a Lucene subproject?)
+          </li>
+        </ul>
+      </section>
+      <section id="April 2008">
+        <title>April 2008</title>
+        <p>
+          Apache Tika is a toolkit for detecting and extracting metadata and
+          structured text content from various documents using existing parser
+          libraries. Tika entered incubation on March 22nd, 2007.
+        </p>
+        <p><strong>Community</strong></p>
+        <ul>
+          <li>
+            Niall Pemberton joined the project as a committer and PPMC member
+          </li>
+          <li>
+            The number of issues reported by external contributors
+            is growing gradually
+          </li>
+          <li>There was a Fast Feather Talk on Tika in ApacheCon EU 2008</li>
+          <li>We have good contacts especially with Apache POI and PDFBox</li>
+        </ul>
+        <p><strong>Development</strong></p>
+        <ul>
+          <li>We are working towards Tika 0.2</li>
+          <li>Metadata handling improvements are being discussed</li>
+        </ul>
+        <p><strong>Issues before graduation</strong></p>
+        <ul>
+          <li>Increase the size of the community</li>
+        </ul>
+      </section>
+      <section id="January 2008">
+        <title>January 2008</title>
+        <p>
+          Tika (http://incubator.apache.org/tika) is a toolkit for detecting
+          and extracting metadata and structured text content from various
+          documents using existing parser Libraries. Tika entered incubation
+          on March 22nd, 2007.
+        </p>
+        <p><strong>Community</strong></p>
+        <ul>
+          <li>
+            No new committers since the last report, activity has been
+            moderate but steady, leading to the 0.1 release.
+          </li>
+        </ul>
+        <p><strong>Development</strong></p>
+        <ul>
+          <li>Tika 0.1 (incubating) has just been released.</li>
+          <li>
+            Chris Mattmann intends to use that release in Nutch, That's
+            good progress towards Tika's goal of providing data extraction
+            functionality to other projects.
+          </li>
+          <li>
+            A new Tika logo was created by Google Highly Open Participation
+            student, hasn't been integrated yet.
+          </li>
+        </ul>
+        <p><strong>Issues before graduation</strong></p>
+        <ul>
+          <li>
+            Now that the first release is out, we need to work on growing
+            the community and figuring out how to best interact with external
+            parser projects.
+          </li>
+        </ul>
+      </section>
+      <section name="October 2007">
+        <title>October 2007</title>
+        <p>
+          Tika is a toolkit for detecting and extracting metadata and
+          structured text content from various documents using existing
+          parser libraries. Tika entered incubation on March 22nd, 2007.
+        </p>
+        <p><strong>Community</strong></p>
+        <p>
+          There have been a number of positive items within Tika during the
+          last few months. The traffic on the Tika mailing list has increased
+          significantly (with typically 2, 3 questions, and 1 or 2 commits
+          every day, or every other day), and there have been a lot of recent
+          inquiries from external projects wanting to collaborate with Tika
+          (including Aperture, PDFBox and a fellow developing a JSon library
+          currently hosted at Google code). In addition, Tika's architecture
+          has become a recent discussion of interest (as we'll see below).
+        </p>
+        <p>
+          We recently elected Keith Bennett as a new committer to Tika.
+          Keith has been spearheading many of the new patches committed to
+          Tika, as well as participating in discussions about the
+          architecture, and future direction of the project.
+        </p>
+        <p>
+          Tika will be represented at the "Fast Feather" track at
+          ApacheCon US by Jukka Zitting. The rest of the community is helping
+          to create the content for the presentation. The abstract is listed
+          below:
+        </p>
+        <blockquote>
+          Tika is a new content analysis framework borne from the desire to
+          factor our commonality from the Apache Nutch search engine framework.
+          Tika provides a mime detection framework, an extensible parsing
+          framework and metadata environment for content analysis. Though in
+          its nascent stages, progress on Tika has recently taken shape and
+          the project is nearing a stable 0.1 release. In this talk, we'll
+          describe the core APIs of Tika and discuss its use in several
+          distinct domains including search engines, scientific data
+          dissemination and an industrial setting.
+        </blockquote>
+        <p><strong>Development</strong></p>
+        <p>
+          There have been a flurry of JIRA issues and code activity
+          (http://issues.apache.org/jira/browse/TIKA) including 47 issues
+          currently in JIRA, with 32 resolved issues, 14 closed issues,
+          and 2 open major/minor issues in progress).
+        </p>
+        <p>
+          Tika's Parser interface (one of its key components) has just
+          undergone a major overhaul led by Jukka Zitting, and Chris
+          Mattmann has recently contributed a MimeType system (with help from
+          fellow Apache Nutch committer Jerome Charron) to Tika. We also
+          cleaned up and refactored large parts of the rest of the code
+          (removing references to LiusLite and branding the project wherever
+          possible with the Tika name), in preparation for an upcoming
+          0.1 release.
+        </p>
+        <p>
+          Chris Mattmann has led an effort to carve out the existing MimeType
+          detection system in Apache Nutch (http://lucene.apache.org/nutch/)
+          and replace it with Tika's improved MimeType detection system.
+          There is a patch sitting in JIRA right now
+          (http://issues.apache.org/jira/browse/NUTCH-562), and barring
+          objections, Nutch will rely on Tika for its MimeType detection
+          abilities.
+        </p>
+        <p>
+          Also active recently were committers Bertrand Delacretaz, Sami
+          Siren and Rida Benjelloun, committing patches and improvements
+          wherever needed.
+        </p>
+        <p><strong>Issues before graduation</strong></p>
+        <p>
+          No changes since our last report: the Tika project is still at
+          an early stage of incubation. We need to continue bringing in the
+          initial codebases and are targeting an initial incubating release
+          (0.1) probably within the next month. We also need to work on
+          growing the community and figuring out how to best interact with
+          external parser projects.
+        </p>
+      </section>
+      <section id="July 2007">
+        <title>July 2007</title>
+        <p>
+          Tika is a toolkit for detecting and extracting metadata and
+          structured text content from various document formats using existing
+          parser libraries. Tika entered incubation on March 22nd, 2007.
+        </p>
+        <p><strong>Community</strong></p>
+        <ul>
+          <li>
+            The Tika mailing list has seen increased activity in the last
+            weeks, with some new people showing interest for Tika's goals.
+          </li>
+          <li>
+            Grant Ingersoll brought the Aperture framework to our attention
+            (http://aperture.sourceforge.net/), which has similar goals to
+            Tika. We will look at possible synergies.
+          </li>
+        </ul>
+        <p><strong>Development</strong></p>
+        <ul>
+          <li>
+            No code has been committed since our last report, but some
+            initial code is ready in JIRA and should be committed soon.
+          </li>
+        </ul>
+        <p><strong>Issues before graduation</strong></p>
+        <ul>
+          <li>
+            No changes since our last report: the Tika project is still at
+            an early stage of incubation. We need to continue bringing in
+            the initial codebases and probably target for an initial
+            incubating release later this year. We also need to work on
+            growing the community and figuring out how to best interact with
+            external parser projects.
+          </li>
+        </ul>
+      </section>
+      <section id="June 2007">
+        <title>June 2007</title>
+        <p>
+          Tika is a toolkit for detecting and extracting metadata and
+          structured text content from various documents using existing
+          parser libraries. Tika entered incubation on March 22nd, 2007.
+        </p>
+        <p><strong>Community</strong></p>
+        <p>
+          The Tika mailing lists have been relatively quiet lately, probably
+          because with little code we don't yet have many concrete issues
+          to talk about.
+        </p>
+        <p><strong>Development</strong></p>
+        <p>
+          We saw the first piece of Tika code when Chris A. Mattmann ported
+          the Nutch metadata framework to Tika. Rida Benjelloun has created
+          a version of the Lius codebase to be included in Tika, and the
+          code is currently in the issue tracker.
+        </p>
+        <p><strong>Issues before graduation</strong></p>
+        <p>
+          The Tika project is still at an early stage of incubation. We need
+          to continue bringing in the initial codebases and probably target
+          for an initial incubating release later this year. We also need to
+          work on growing the community and figuring out how to best interact
+          with external parser projects.
+        </p>
+      </section>
       <section id="May 2007">
         <title>May 2007</title>
         <p>
@@ -252,14 +535,14 @@
               <th>item</th>
             </tr>
             <tr>
-              <td>....-..-..</td>
+              <td>2008-10-17</td>
               <td>Check and make sure that the papers that transfer rights to the ASF
                   been received. It is only necessary to transfer rights for the
                   package, the core code, and any new code produced by the project.
               </td>
             </tr>
             <tr>
-              <td>....-..-..</td>
+              <td>2008-10-17</td>
               <td>Check and make sure that the files that have been donated have been
                   updated to reflect the new ASF copyright.</td>
             </tr>
@@ -273,14 +556,14 @@
               <th>item</th>
             </tr>
             <tr>
-              <td>....-..-..</td>
+              <td>2008-10-17</td>
               <td>Check and make sure that for all code included with the distribution
                   that is not under the Apache license, have the right to combine
                   with Apache-licensed code and redistribute.
                </td>
             </tr>
             <tr>
-              <td>....-..-..</td>
+              <td>2008-10-17</td>
               <td>Check and make sure that all source code distributed by the project
                   is covered by one or more of the following approved licenses: Apache,
                   BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
@@ -305,7 +588,7 @@
               <td>Add all active committers in the STATUS file.</td>
             </tr>
             <tr>
-              <td>....-..-..</td>
+              <td>2008-10-17</td>
               <td>Ask root for the creation of committers' accounts on
                   people.apache.org.</td>
             </tr>

Modified: incubator/public/trunk/site-publish/projects/tika.html
URL: http://svn.apache.org/viewvc/incubator/public/trunk/site-publish/projects/tika.html?rev=705400&r1=705399&r2=705400&view=diff
==============================================================================
--- incubator/public/trunk/site-publish/projects/tika.html (original)
+++ incubator/public/trunk/site-publish/projects/tika.html Thu Oct 16 16:19:50 2008
@@ -126,6 +126,10 @@
 </h2>
 <div class="section-content">
 <ul>
+        <li><strong>2008-10-01:</strong> Dave Meikle added as committer</li>
+        <li><strong>2008-04-07:</strong> Niall Pemberton added as committer</li>
+        <li><strong>2007-12-27:</strong> Apache Tika 0.1-incubating released</li>
+        <li><strong>2007-10-02:</strong> Keith Bennett added as committer</li>
         <li><strong>2007-03-22:</strong> Apache Tika begins incubation</li>
       </ul>
 </div>
@@ -164,12 +168,12 @@
         <tr>
           <td>Bug tracking</td>
           <td>jira</td>
-          <td><a href="http://issues.apache.org/jira/browse/TIKA">TIKA</a></td>
+          <td><a href="https://issues.apache.org/jira/browse/TIKA">TIKA</a></td>
         </tr>
         <tr>
           <td>Source code</td>
           <td>svn</td>
-          <td><a href="http://svn.apache.org/repos/asf/incubator/tika/">http://svn.apache.org/repos/asf/incubator/tika/</a></td>
+          <td><a href="https://svn.apache.org/repos/asf/incubator/tika/">https://svn.apache.org/repos/asf/incubator/tika/</a></td>
         </tr>
         <tr>
           <td>Sponsor</td>
@@ -193,7 +197,7 @@
         </tr>
         <tr>
           <td>Committers</td>
-          <td />
+          <td>ridabenjelloun</td>
           <td>Rida Benjelloun</td>
         </tr>
         <tr>
@@ -221,6 +225,16 @@
           <td>kbennett</td>
           <td>Keith Bennett</td>
         </tr>
+        <tr>
+          <td />
+          <td>niallp</td>
+          <td>Niall Pemberton</td>
+        </tr>
+        <tr>
+          <td />
+          <td>dmeikle</td>
+          <td>Dave Meikle</td>
+        </tr>
       </table>
 </div>
            <h2><img src="../images/redarrow.gif" />
@@ -228,6 +242,289 @@
 </h2>
 <div class="section-content">
 <h3>
+   <a name="October 2008">October 2008</a>
+</h3>
+<div class="section-content">
+<p>
+          Apache Tika is a toolkit for detecting and extracting metadata and
+          structured text content from various documents using existing parser
+          libraries. Tika entered incubation on March 22nd, 2007.
+        </p>
+<p><strong>Community</strong></p>
+<ul>
+          <li>Dave Meikle was just voted in as a new committer.</li>
+          <li>Paolo Mottadelli will present Tika at ApacheCon US.</li>
+        </ul>
+<p><strong>Development</strong></p>
+<ul>
+          <li>Tika 0.2 should be released soon.</li>
+          <li>Usage documentation has been added to the website.</li>
+        </ul>
+<p><strong>Issues before graduation</strong></p>
+<ul>
+          <li>
+            The current plan is to graduate as a Lucene subproject, which
+            could happen soon as the incubation criteria seem to be met.
+          </li>
+        </ul>
+</div>
+<h3>
+   <a name="July 2008">July 2008</a>
+</h3>
+<div class="section-content">
+<p>
+          Apache Tika is a toolkit for detecting and extracting metadata and
+          structured text content from various documents using existing parser
+          libraries. Tika entered incubation on March 22nd, 2007.
+        </p>
+<p><strong>Community</strong></p>
+<ul>
+          <li>
+            Tika community remains relatively small, with just a handful of
+            active members
+          </li>
+        </ul>
+<p><strong>Development</strong></p>
+<ul>
+          <li>
+            Work towards Tika 0.2 continues, Chris Mattman has volunteered
+            to be the release manager
+          </li>
+        </ul>
+<p><strong>Issues before graduation</strong></p>
+<ul>
+          <li>
+            Increase the size and diversity of the community (or graduate
+            into a Lucene subproject?)
+          </li>
+        </ul>
+</div>
+<h3>
+   <a name="April 2008">April 2008</a>
+</h3>
+<div class="section-content">
+<p>
+          Apache Tika is a toolkit for detecting and extracting metadata and
+          structured text content from various documents using existing parser
+          libraries. Tika entered incubation on March 22nd, 2007.
+        </p>
+<p><strong>Community</strong></p>
+<ul>
+          <li>
+            Niall Pemberton joined the project as a committer and PPMC member
+          </li>
+          <li>
+            The number of issues reported by external contributors
+            is growing gradually
+          </li>
+          <li>There was a Fast Feather Talk on Tika in ApacheCon EU 2008</li>
+          <li>We have good contacts especially with Apache POI and PDFBox</li>
+        </ul>
+<p><strong>Development</strong></p>
+<ul>
+          <li>We are working towards Tika 0.2</li>
+          <li>Metadata handling improvements are being discussed</li>
+        </ul>
+<p><strong>Issues before graduation</strong></p>
+<ul>
+          <li>Increase the size of the community</li>
+        </ul>
+</div>
+<h3>
+   <a name="January 2008">January 2008</a>
+</h3>
+<div class="section-content">
+<p>
+          Tika (http://incubator.apache.org/tika) is a toolkit for detecting
+          and extracting metadata and structured text content from various
+          documents using existing parser Libraries. Tika entered incubation
+          on March 22nd, 2007.
+        </p>
+<p><strong>Community</strong></p>
+<ul>
+          <li>
+            No new committers since the last report, activity has been
+            moderate but steady, leading to the 0.1 release.
+          </li>
+        </ul>
+<p><strong>Development</strong></p>
+<ul>
+          <li>Tika 0.1 (incubating) has just been released.</li>
+          <li>
+            Chris Mattmann intends to use that release in Nutch, That's
+            good progress towards Tika's goal of providing data extraction
+            functionality to other projects.
+          </li>
+          <li>
+            A new Tika logo was created by Google Highly Open Participation
+            student, hasn't been integrated yet.
+          </li>
+        </ul>
+<p><strong>Issues before graduation</strong></p>
+<ul>
+          <li>
+            Now that the first release is out, we need to work on growing
+            the community and figuring out how to best interact with external
+            parser projects.
+          </li>
+        </ul>
+</div>
+<h3>
+   October 2007
+</h3>
+<div class="section-content">
+<p>
+          Tika is a toolkit for detecting and extracting metadata and
+          structured text content from various documents using existing
+          parser libraries. Tika entered incubation on March 22nd, 2007.
+        </p>
+<p><strong>Community</strong></p>
+<p>
+          There have been a number of positive items within Tika during the
+          last few months. The traffic on the Tika mailing list has increased
+          significantly (with typically 2, 3 questions, and 1 or 2 commits
+          every day, or every other day), and there have been a lot of recent
+          inquiries from external projects wanting to collaborate with Tika
+          (including Aperture, PDFBox and a fellow developing a JSon library
+          currently hosted at Google code). In addition, Tika's architecture
+          has become a recent discussion of interest (as we'll see below).
+        </p>
+<p>
+          We recently elected Keith Bennett as a new committer to Tika.
+          Keith has been spearheading many of the new patches committed to
+          Tika, as well as participating in discussions about the
+          architecture, and future direction of the project.
+        </p>
+<p>
+          Tika will be represented at the "Fast Feather" track at
+          ApacheCon US by Jukka Zitting. The rest of the community is helping
+          to create the content for the presentation. The abstract is listed
+          below:
+        </p>
+<blockquote>
+          Tika is a new content analysis framework borne from the desire to
+          factor our commonality from the Apache Nutch search engine framework.
+          Tika provides a mime detection framework, an extensible parsing
+          framework and metadata environment for content analysis. Though in
+          its nascent stages, progress on Tika has recently taken shape and
+          the project is nearing a stable 0.1 release. In this talk, we'll
+          describe the core APIs of Tika and discuss its use in several
+          distinct domains including search engines, scientific data
+          dissemination and an industrial setting.
+        </blockquote>
+<p><strong>Development</strong></p>
+<p>
+          There have been a flurry of JIRA issues and code activity
+          (http://issues.apache.org/jira/browse/TIKA) including 47 issues
+          currently in JIRA, with 32 resolved issues, 14 closed issues,
+          and 2 open major/minor issues in progress).
+        </p>
+<p>
+          Tika's Parser interface (one of its key components) has just
+          undergone a major overhaul led by Jukka Zitting, and Chris
+          Mattmann has recently contributed a MimeType system (with help from
+          fellow Apache Nutch committer Jerome Charron) to Tika. We also
+          cleaned up and refactored large parts of the rest of the code
+          (removing references to LiusLite and branding the project wherever
+          possible with the Tika name), in preparation for an upcoming
+          0.1 release.
+        </p>
+<p>
+          Chris Mattmann has led an effort to carve out the existing MimeType
+          detection system in Apache Nutch (http://lucene.apache.org/nutch/)
+          and replace it with Tika's improved MimeType detection system.
+          There is a patch sitting in JIRA right now
+          (http://issues.apache.org/jira/browse/NUTCH-562), and barring
+          objections, Nutch will rely on Tika for its MimeType detection
+          abilities.
+        </p>
+<p>
+          Also active recently were committers Bertrand Delacretaz, Sami
+          Siren and Rida Benjelloun, committing patches and improvements
+          wherever needed.
+        </p>
+<p><strong>Issues before graduation</strong></p>
+<p>
+          No changes since our last report: the Tika project is still at
+          an early stage of incubation. We need to continue bringing in the
+          initial codebases and are targeting an initial incubating release
+          (0.1) probably within the next month. We also need to work on
+          growing the community and figuring out how to best interact with
+          external parser projects.
+        </p>
+</div>
+<h3>
+   <a name="July 2007">July 2007</a>
+</h3>
+<div class="section-content">
+<p>
+          Tika is a toolkit for detecting and extracting metadata and
+          structured text content from various document formats using existing
+          parser libraries. Tika entered incubation on March 22nd, 2007.
+        </p>
+<p><strong>Community</strong></p>
+<ul>
+          <li>
+            The Tika mailing list has seen increased activity in the last
+            weeks, with some new people showing interest for Tika's goals.
+          </li>
+          <li>
+            Grant Ingersoll brought the Aperture framework to our attention
+            (http://aperture.sourceforge.net/), which has similar goals to
+            Tika. We will look at possible synergies.
+          </li>
+        </ul>
+<p><strong>Development</strong></p>
+<ul>
+          <li>
+            No code has been committed since our last report, but some
+            initial code is ready in JIRA and should be committed soon.
+          </li>
+        </ul>
+<p><strong>Issues before graduation</strong></p>
+<ul>
+          <li>
+            No changes since our last report: the Tika project is still at
+            an early stage of incubation. We need to continue bringing in
+            the initial codebases and probably target for an initial
+            incubating release later this year. We also need to work on
+            growing the community and figuring out how to best interact with
+            external parser projects.
+          </li>
+        </ul>
+</div>
+<h3>
+   <a name="June 2007">June 2007</a>
+</h3>
+<div class="section-content">
+<p>
+          Tika is a toolkit for detecting and extracting metadata and
+          structured text content from various documents using existing
+          parser libraries. Tika entered incubation on March 22nd, 2007.
+        </p>
+<p><strong>Community</strong></p>
+<p>
+          The Tika mailing lists have been relatively quiet lately, probably
+          because with little code we don't yet have many concrete issues
+          to talk about.
+        </p>
+<p><strong>Development</strong></p>
+<p>
+          We saw the first piece of Tika code when Chris A. Mattmann ported
+          the Nutch metadata framework to Tika. Rida Benjelloun has created
+          a version of the Lius codebase to be included in Tika, and the
+          code is currently in the issue tracker.
+        </p>
+<p><strong>Issues before graduation</strong></p>
+<p>
+          The Tika project is still at an early stage of incubation. We need
+          to continue bringing in the initial codebases and probably target
+          for an initial incubating release later this year. We also need to
+          work on growing the community and figuring out how to best interact
+          with external parser projects.
+        </p>
+</div>
+<h3>
    <a name="May 2007">May 2007</a>
 </h3>
 <div class="section-content">
@@ -374,14 +671,14 @@
               <th>item</th>
             </tr>
             <tr>
-              <td>....-..-..</td>
+              <td>2008-10-17</td>
               <td>Check and make sure that the papers that transfer rights to the ASF
                   been received. It is only necessary to transfer rights for the
                   package, the core code, and any new code produced by the project.
               </td>
             </tr>
             <tr>
-              <td>....-..-..</td>
+              <td>2008-10-17</td>
               <td>Check and make sure that the files that have been donated have been
                   updated to reflect the new ASF copyright.</td>
             </tr>
@@ -397,14 +694,14 @@
               <th>item</th>
             </tr>
             <tr>
-              <td>....-..-..</td>
+              <td>2008-10-17</td>
               <td>Check and make sure that for all code included with the distribution
                   that is not under the Apache license, have the right to combine
                   with Apache-licensed code and redistribute.
                </td>
             </tr>
             <tr>
-              <td>....-..-..</td>
+              <td>2008-10-17</td>
               <td>Check and make sure that all source code distributed by the project
                   is covered by one or more of the following approved licenses: Apache,
                   BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
@@ -431,7 +728,7 @@
               <td>Add all active committers in the STATUS file.</td>
             </tr>
             <tr>
-              <td>....-..-..</td>
+              <td>2008-10-17</td>
               <td>Ask root for the creation of committers' accounts on
                   people.apache.org.</td>
             </tr>



---------------------------------------------------------------------
To unsubscribe, e-mail: cvs-unsubscribe@incubator.apache.org
For additional commands, e-mail: cvs-help@incubator.apache.org


Mime
View raw message