Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-dev@lucene.apache.org
Received-SPF: neutral (herse.apache.org: local policy)
Message-ID: <468EBF5E.90508@apache.org>
Date: Fri, 06 Jul 2007 15:17:02 -0700
From: Doug Cutting <cutting@apache.org>
User-Agent: Thunderbird 1.5.0.12 (X11/20070604)
MIME-Version: 1.0
To: java-dev@lucene.apache.org
Subject: Re: Spliting index
References: <804507450705281210r34faa59fka6a44d925d31c23f@mail.gmail.com>
In-Reply-To: <804507450705281210r34faa59fka6a44d925d31c23f@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit

You can implement a FilterIndexReader that returns only a subset of an 
index.  Then use IndexWriter#addIndexes() to add this to a new, empty 
index.  Do this for each range of terms.

This is somewhat similar to Nutch's IndexSorter:

http://svn.apache.org/viewvc/lucene/nutch/trunk/src/java/org/apache/nutch/indexer/IndexSorter.java?view=markup

Note that IndexWriter#addIndexes() doesn't require that all IndexReader 
methods be implemented.

Doug

Daniel Cre�o wrote:
> I'd wanna split my lucene index in smaller segments, each one holding all
> terms starting with the same char.
> 
> I started writing Term's and TermInfo's but i'm worried about others files
> and especially the pointers.
> 
> What care should I have while splitting index?
> 
> - Daniel
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org