lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: distributing the indexing process
Date Thu, 07 Jul 2011 03:49:39 GMT
We've used Hadoop MapReduce with Solr to parallelize indexing for a customer and that brought
down their multi-hour indexing process down to a couple of minutes.  There is/was also Lucene-level
contrib in Hadoop that makes use of MapReduce to parallelize indexing.

Otis

----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


----- Original Message -----
> From: Guru Chandar <Guru.Chandar@consona.com>
> To: java-user@lucene.apache.org
> Cc: 
> Sent: Thursday, June 30, 2011 5:12 AM
> Subject: distributing the indexing process
> 
> 
> 
> If we have to index a lot of documents, is there a way to divide the
> documents into multiple sets and index them on multiple machines in
> parallel, and then merge the resulting indexes back into a single
> machine? If yes, will the result be logically equivalent to indexing all
> the documents on a single machine?
> 
> 
> 
> Thanks,
> 
> -gc
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message