lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shashi Kant <sk...@sloan.mit.edu>
Subject Re: Index Partitioning
Date Tue, 24 Mar 2009 01:01:20 GMT
This is perfect, exactly what I was looking for. Thanks much Andrzej!


On Mon, Mar 23, 2009 at 1:43 AM, Andrzej Bialecki <ab@getopt.org> wrote:

> Shashi Kant wrote:
>
>> Is there an "elegant" approach to partitioning a large Lucene index (~1TB)
>> into smaller sub-indexes other than the obvious method of re-indexing into
>> partitions?
>> Any ideas?
>>
>
> Try the following:
>
> * open your index, and mark all documents as deleted except 1/Nth that
> should fill the first shard. Close the index, BUT DO NOT OPTIMIZE IT!
>
> * create IndexWriter, and use addIndexes to add the original index. Only
> non-deleted docs will be copied.
>
> * open the original index and use undeleteAll() to revert the deletions.
>
> * mark the next 1/Nth documents as deleted
> ...
> * repeat the cycle as many times as needed
>
> A more elegant version of this algorithm can be implemented using
> FilterIndexReader.
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message