Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 57084 invoked from network); 18 Sep 2007 22:59:05 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 18 Sep 2007 22:59:05 -0000 Received: (qmail 56323 invoked by uid 500); 18 Sep 2007 22:58:56 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 56199 invoked by uid 500); 18 Sep 2007 22:58:56 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 56190 invoked by uid 99); 18 Sep 2007 22:58:56 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Sep 2007 15:58:56 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Sep 2007 22:59:04 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 99606714204 for ; Tue, 18 Sep 2007 15:58:43 -0700 (PDT) Message-ID: <10980148.1190156323614.JavaMail.jira@brutus> Date: Tue, 18 Sep 2007 15:58:43 -0700 (PDT) From: "stack (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-1913) [HBase] Build a Lucene index on an HBase table In-Reply-To: <16400495.1190071904984.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12528602 ] stack commented on HADOOP-1913: ------------------------------- That the configuration is per job rather than per instance is an important distinction. Could an xml file be passed to jobs on the command line? > [HBase] Build a Lucene index on an HBase table > ---------------------------------------------- > > Key: HADOOP-1913 > URL: https://issues.apache.org/jira/browse/HADOOP-1913 > Project: Hadoop > Issue Type: New Feature > Components: contrib/hbase > Reporter: Ning Li > Priority: Minor > Attachments: build_table_index.patch, build_table_index.take2.again.patch, build_table_index.take2.patch > > > This patch provides a Reducer class and other related classes which help to build a Lucene index on an HBase table. The index build part is similar to that of Nutch. > - Each row is modeled as a Lucene document: row key is indexed in its untokenized form, column name-value pairs are Lucene field name-value pairs. > - IndexConf is used to configure various Lucene parameters, specify whether to optimize an index and which columns to index and/or store, in tokenized or untokenized form, etc. > - The number of reduce tasks decides the number of indexes (partitions). The index(es) is stored in the output path of job configuration. > - The index build process is done in the reduce phase. Users can use the map phase to join rows from different tables or to pre-parse/analyze column content, etc. > - A junit test is added to test the build of an index on an HBase table with an identity mapper. It also serves as an example on how to use the new classes. > - BuildTableIndex is provided to help building an index on an HBase table. It should be moved to examples package if HBase decides to have one. > This patch requires the inclusion of the Lucene library. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.