From general-return-1861-apmail-lucene-general-archive=lucene.apache.org@lucene.apache.org Wed Dec 09 13:49:40 2009 Return-Path: Delivered-To: apmail-lucene-general-archive@www.apache.org Received: (qmail 13885 invoked from network); 9 Dec 2009 13:49:39 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 9 Dec 2009 13:49:39 -0000 Received: (qmail 3812 invoked by uid 500); 9 Dec 2009 13:49:38 -0000 Delivered-To: apmail-lucene-general-archive@lucene.apache.org Received: (qmail 3294 invoked by uid 500); 9 Dec 2009 13:49:37 -0000 Mailing-List: contact general-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@lucene.apache.org Delivered-To: mailing list general@lucene.apache.org Received: (qmail 3277 invoked by uid 99); 9 Dec 2009 13:49:37 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Dec 2009 13:49:37 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [208.97.132.81] (HELO homiemail-a25.g.dreamhost.com) (208.97.132.81) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Dec 2009 13:49:26 +0000 Received: from [10.0.0.77] (adsl-065-013-152-164.sip.rdu.bellsouth.net [65.13.152.164]) by homiemail-a25.g.dreamhost.com (Postfix) with ESMTPA id 77868678063; Wed, 9 Dec 2009 05:49:04 -0800 (PST) From: Grant Ingersoll Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Subject: [REPORT] Lucene December 2009 Board Report Date: Wed, 9 Dec 2009 08:49:03 -0500 Message-Id: <86616F9A-8F74-4B34-8BBB-3A604BF94683@apache.org> Cc: general@lucene.apache.org To: Apache Board Mime-Version: 1.0 (Apple Message framework v1077) X-Mailer: Apple Mail (2.1077) X-Virus-Checked: Checked by ClamAV on apache.org === Lucene Status Report: December, 2009 === TLP -The PMC added George Aroush and Chris Mattmann to the PMC -The PMC added Open Relevance committer Robert Muir -The PMC added Mahout committer Jake Mannix -The PMC added Tika committer Ken Krugler LUCENE JAVA Lucene Java is a search-engine toolkit. Development has been active and we released both 2.9 and 3.0 this quarter SOLR Solr is a full text search server using Lucene Java. Development and the community is active. Solr released version 1.4 this quarter. NUTCH Nutch is a web-search engine: crawler, indexer and search runtime. There has been a recent flurry of work on discussing Nutch's future post ApacheCon, spearheaded by Andrzej Bialecki and others. In addition, there is ongoing work on reducing code duplication (tighter integration of the Tika parsing framework and mime type detection, better Solr integration) and using a more flexible storage system (e.g. HBase). Many issues are being fixed in preparation for a 1.1 release early next quarter. LUCY Lucy is a loose C port of Lucene targeted at dynamic language bindings. Development this quarter has focused on abstraction of the IO subsystem and portability to various compiler platforms. LUCENE.NET Lucene.NET is a .NET based port of Lucene Java. Development and the community are active. Lucene.NET graduated from the incubator and is now a full-fledged Lucene sub-project. Mahout Apache Mahout is working towards building a suite of scalable machine learning libraries for text and data mining. Development is active and version 0.2 was released this quarter. Open Relevance Project The Open Relevance Project is a new project aimed at providing Lucene and others tools for judging the quality of search and machine learning approaches. The project added Robert Muir as a committer this quarter and development is getting under way. Recent work has added support for Indonesian "Tempo" and Persian "Hamshahri" collection to execute relevance judgements with lucene-benchmark. PyLucene PyLucene is a Python integration of Lucene Java. Development is active. Closely tracking the Lucene Java releases, we released PyLucene 2.9.0, PyLucene 2.9.1 and PyLucene 3.0.0 this quarter. A major addition was made to JCC, the code generator making PyLucene possible: the support for Java generics now in use by Lucene Java 3.0. TIKA Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. Tika released version 0.5 this quarter. There have been recent development efforts to speed up Tika's mime detector, as well as efforts to provide a self-contained OGSI-based Tika bundle. There is a strong desire to release these post 0.5 improvements, so we are planning to release Tika 0.6 in the next few weeks.