lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: Index Partitioning
Date Mon, 23 Mar 2009 05:43:40 GMT
Shashi Kant wrote:
> Is there an "elegant" approach to partitioning a large Lucene index (~1TB)
> into smaller sub-indexes other than the obvious method of re-indexing into
> partitions?
> Any ideas?

Try the following:

* open your index, and mark all documents as deleted except 1/Nth that 
should fill the first shard. Close the index, BUT DO NOT OPTIMIZE IT!

* create IndexWriter, and use addIndexes to add the original index. Only 
non-deleted docs will be copied.

* open the original index and use undeleteAll() to revert the deletions.

* mark the next 1/Nth documents as deleted
...
* repeat the cycle as many times as needed

A more elegant version of this algorithm can be implemented using 
FilterIndexReader.

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message