Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 70668 invoked from network); 12 Jun 2008 18:10:37 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 12 Jun 2008 18:10:37 -0000 Received: (qmail 37683 invoked by uid 500); 12 Jun 2008 18:10:31 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 37656 invoked by uid 500); 12 Jun 2008 18:10:31 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 37645 invoked by uid 99); 12 Jun 2008 18:10:31 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 Jun 2008 11:10:31 -0700 X-ASF-Spam-Status: No, hits=2.6 required=10.0 tests=DNS_FROM_OPENWHOIS,SPF_HELO_PASS,SPF_PASS,WHOIS_MYPRIVREG X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of lists@nabble.com designates 216.139.236.158 as permitted sender) Received: from [216.139.236.158] (HELO kuber.nabble.com) (216.139.236.158) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 Jun 2008 18:09:40 +0000 Received: from isper.nabble.com ([192.168.236.156]) by kuber.nabble.com with esmtp (Exim 4.63) (envelope-from ) id 1K6rFE-0002dM-VT for java-user@lucene.apache.org; Thu, 12 Jun 2008 11:09:56 -0700 Message-ID: <17806254.post@talk.nabble.com> Date: Thu, 12 Jun 2008 11:09:56 -0700 (PDT) From: Adrian Tarau To: java-user@lucene.apache.org Subject: Re: Does lucene support distributed indexing? In-Reply-To: <237280.26430.qm@web50309.mail.re2.yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Nabble-From: adrian.tarau@gmail.com References: <25aacb800804252333if6b22f1j930238fb415ce1b7@mail.gmail.com> <237280.26430.qm@web50309.mail.re2.yahoo.com> X-Virus-Checked: Checked by ClamAV on apache.org I've started an year ago a different implementation of ParallelMultiSearche= r using a ThreadPoolExecutor where everything is parallelized. Unfortunately, I had to interrupt this and work on something else, but this month I'll start working again. Right now there are some dependencies so it cannot be used outside my infrastructure(like discovering new nodes, notifications between nodes), but I'm thinking to extract this as a separat= e project(maybe latter) so can be used as an Lucene extension. I will post some code as soon as I will have something to show :) Thanks. Otis Gospodnetic wrote: >=20 > There are actually several distributed indexing or searching projects in > Lucene (the top-level ASF Lucene project, not Lucene Java), and it's time > to start thinking about the possibility of bringing them together, findin= g > commonalities, etc. >=20 > Here is the summary: > - Lucene - distributed search via ParallelMultiSearcher. How you split > indices/shards is up to you. > - Solr - distributed indexing via SOLR-303 (see DistributedSearch on its > Wiki). How you split indices/shards is up to you. > - Nutch - see its org.apache.nutch.ipc (I think). How you split > indices/segments is up to you. > - Nutch - see the bottom of > http://wiki.apache.org/nutch/Nutch2Architecture >=20 > There is also Hadoop: > - Using MapReduce + HDFS to build a single Lucene index in a distributed > fashion (see contrib/ in Hadoop) >=20 > There is also GridLucene project somewhere on the web... >=20 > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >=20 > ----- Original Message ---- >> From: Grant Ingersoll >> To: java-user@lucene.apache.org >> Sent: Saturday, April 26, 2008 4:20:19 PM >> Subject: Re: Does lucene support distributed indexing? >>=20 >>=20 >> On Apr 26, 2008, at 2:33 AM, Samuel Guo wrote: >>=20 >> > Hi all=EF=BC=8C >> > >> > I am a lucene newbie:) >> > >> > It seems that lucene doesn't support distributed indexing:( >> > As some IR research papers mentioned, when the documents collection = =20 >> > become >> > large, the index will be large also. When one single machine can't =20 >> > hold all >> > the index, some strategies are used to solve it. such as that we can = =20 >> > part >> > the whole collection into several small sub-collections. According to >> > different partitions, we can got different strategies : document-=20 >> > partittion >> > and term-partition. but I don't know why not lucene support these =20 >> > ways:( >> > can't anyone explain it ? >>=20 >> Because no one has donated the code to do it. You can do distributed = =20 >> indexing via Nutch and some (albeit non fault tolerant) distributed =20 >> Search in Lucene. Solr also now has distributed search. >>=20 >> -Grant >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >>=20 >>=20 >=20 >=20 >=20 > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org >=20 >=20 >=20 --=20 View this message in context: http://www.nabble.com/Does-lucene-support-dis= tributed-indexing--tp16909912p17806254.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org