Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 49362 invoked from network); 8 Dec 2006 20:22:14 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 8 Dec 2006 20:22:14 -0000 Received: (qmail 22330 invoked by uid 500); 8 Dec 2006 20:22:18 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 22282 invoked by uid 500); 8 Dec 2006 20:22:18 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 22270 invoked by uid 99); 8 Dec 2006 20:22:18 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Dec 2006 12:22:18 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (herse.apache.org: local policy) Received: from [204.127.192.85] (HELO rwcrmhc15.comcast.net) (204.127.192.85) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Dec 2006 12:22:06 -0800 Received: from [192.168.168.15] (c-71-202-24-246.hsd1.ca.comcast.net[71.202.24.246]) by comcast.net (rwcrmhc15) with ESMTP id <20061208202145m15001fu4he>; Fri, 8 Dec 2006 20:21:45 +0000 Message-ID: <4579C959.3070109@apache.org> Date: Fri, 08 Dec 2006 12:21:45 -0800 From: Doug Cutting User-Agent: Thunderbird 1.5.0.8 (X11/20061117) MIME-Version: 1.0 To: java-dev@lucene.apache.org Subject: Re: Spliting the Lucene References: <4579B02A.1050703@apache.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org howard chen wrote: > Can you suggest if using Hadoop + Lucene, how to make a simple > distributed indexing & searching program, i.e. what are the mapping / > reducing processes involved in both indexing abd searching? There is not yet a universal, best practice for this. Nutch provides an example of how to use Lucene for distributed indexing. Nutch's current distributed search implementation builds on Hadoop's RPC mechanism, but is not based on Hadoop's MapReduce. http://lucene.apache.org/nutch/apidocs/org/apache/nutch/searcher/DistributedSearch.html There has been some discussion of MapReduce-based distributed search on the Nutch lists, e.g.: http://mail-archives.apache.org/mod_mbox/lucene-nutch-user/200604.mbox/%3C4448063D.8050406@apache.org%3E I think Andrzej Bialecki has explored this approach some. Another approach is to build a non-MapReduce-based system specifically for supporting distributed search and indexing. I started a discussion about this a few months ago and hope to start work on this project before long. http://www.nabble.com/-PROPOSAL--index-server-project-tf2469695.html Doug --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org