Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 92457 invoked from network); 6 Jul 2007 22:17:35 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 6 Jul 2007 22:17:35 -0000 Received: (qmail 89550 invoked by uid 500); 6 Jul 2007 22:17:34 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 89389 invoked by uid 500); 6 Jul 2007 22:17:33 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 89368 invoked by uid 99); 6 Jul 2007 22:17:32 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Jul 2007 15:17:32 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (herse.apache.org: local policy) Received: from [204.127.192.84] (HELO rwcrmhc14.comcast.net) (204.127.192.84) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Jul 2007 15:17:25 -0700 Received: from [192.168.168.15] (c-71-202-24-246.hsd1.ca.comcast.net[71.202.24.246]) by comcast.net (rwcrmhc14) with ESMTP id <20070706221704m1400q0enbe>; Fri, 6 Jul 2007 22:17:04 +0000 Message-ID: <468EBF5E.90508@apache.org> Date: Fri, 06 Jul 2007 15:17:02 -0700 From: Doug Cutting User-Agent: Thunderbird 1.5.0.12 (X11/20070604) MIME-Version: 1.0 To: java-dev@lucene.apache.org Subject: Re: Spliting index References: <804507450705281210r34faa59fka6a44d925d31c23f@mail.gmail.com> In-Reply-To: <804507450705281210r34faa59fka6a44d925d31c23f@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org You can implement a FilterIndexReader that returns only a subset of an index. Then use IndexWriter#addIndexes() to add this to a new, empty index. Do this for each range of terms. This is somewhat similar to Nutch's IndexSorter: http://svn.apache.org/viewvc/lucene/nutch/trunk/src/java/org/apache/nutch/indexer/IndexSorter.java?view=markup Note that IndexWriter#addIndexes() doesn't require that all IndexReader methods be implemented. Doug Daniel Cre�o wrote: > I'd wanna split my lucene index in smaller segments, each one holding all > terms starting with the same char. > > I started writing Term's and TermInfo's but i'm worried about others files > and especially the pointers. > > What care should I have while splitting index? > > - Daniel > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org