lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrian Tarau <adrian.ta...@gmail.com>
Subject Re: Does lucene support distributed indexing?
Date Thu, 12 Jun 2008 18:09:56 GMT

I've started an year ago a different implementation of ParallelMultiSearcher
using a ThreadPoolExecutor where everything is parallelized.
Unfortunately, I had to interrupt this and work on something else, but this
month I'll start working again. Right now there are some dependencies so it
cannot be used outside my infrastructure(like discovering new nodes,
notifications between nodes), but I'm thinking to extract this as a separate
project(maybe latter) so can be used as an Lucene extension.

I will post some code as soon as I will have something to show :)

Thanks.






Otis Gospodnetic wrote:
> 
> There are actually several distributed indexing or searching projects in
> Lucene (the top-level ASF Lucene project, not Lucene Java), and it's time
> to start thinking about the possibility of bringing them together, finding
> commonalities, etc.
> 
> Here is the summary:
> - Lucene - distributed search via ParallelMultiSearcher.  How you split
> indices/shards is up to you.
> - Solr - distributed indexing via SOLR-303 (see DistributedSearch on its
> Wiki).  How you split indices/shards is up to you.
> - Nutch - see its org.apache.nutch.ipc (I think).  How you split
> indices/segments is up to you.
> - Nutch - see the bottom of
> http://wiki.apache.org/nutch/Nutch2Architecture
> 
> There is also Hadoop:
> - Using MapReduce + HDFS to build a single Lucene index in a distributed
> fashion (see contrib/ in Hadoop)
> 
> There is also GridLucene project somewhere on the web...
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> ----- Original Message ----
>> From: Grant Ingersoll <gsingers@apache.org>
>> To: java-user@lucene.apache.org
>> Sent: Saturday, April 26, 2008 4:20:19 PM
>> Subject: Re: Does lucene support distributed indexing?
>> 
>> 
>> On Apr 26, 2008, at 2:33 AM, Samuel Guo wrote:
>> 
>> > Hi all,
>> >
>> > I am a lucene newbie:)
>> >
>> > It seems that lucene doesn't support distributed indexing:(
>> > As some IR research papers mentioned, when the documents collection  
>> > become
>> > large, the index will be large also. When one single machine can't  
>> > hold all
>> > the index, some strategies are used to solve it. such as that we can  
>> > part
>> > the whole collection into several small sub-collections. According to
>> > different partitions, we can got different strategies : document- 
>> > partittion
>> > and term-partition. but I don't know why not lucene support these  
>> > ways:(
>> > can't anyone explain it ?
>> 
>> Because no one has donated the code to do it.  You can do distributed  
>> indexing via Nutch and some (albeit non fault tolerant) distributed  
>> Search in Lucene.  Solr also now has distributed search.
>> 
>> -Grant
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> 
>> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Does-lucene-support-distributed-indexing--tp16909912p17806254.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message