Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 22573 invoked from network); 3 Sep 2007 20:14:48 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 3 Sep 2007 20:14:48 -0000 Received: (qmail 20993 invoked by uid 500); 3 Sep 2007 20:14:37 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 20954 invoked by uid 500); 3 Sep 2007 20:14:37 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 20943 invoked by uid 99); 3 Sep 2007 20:14:37 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Sep 2007 13:14:37 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of erickerickson@gmail.com designates 209.85.128.190 as permitted sender) Received: from [209.85.128.190] (HELO fk-out-0910.google.com) (209.85.128.190) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Sep 2007 20:14:31 +0000 Received: by fk-out-0910.google.com with SMTP id z23so1456717fkz for ; Mon, 03 Sep 2007 13:14:09 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=E6uGABXs6j9H+qWaqy+hnyO7MJL9EzWBkTfDPu/aFFNZHzbl8ahTbBUmnn9fjlAPtdULxfNTjZex5FXyelSVwiA0wKf9YHReeuBqbGn7AH3NphR+Uho/3x+zKUzcsagAR4zlDL2jE6parbU8rJsAp6Ujg2L/+rk0/eT5r8g+X3g= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=HgefzhqOqa2F5SNkKbOztFmM+6/Dvn0OGeQxSnPuv81uHhp8TAeV7lRCsrZ92r6MixPKoQ9uH/XmNtkLkW0QPXd6gM8yelspxfuAS5080GsQBO8QfuoJG1e8ATCbbvVgIuQpowk85iHM5jVlNeIWj9XO6oH2G53WSeS6Rxjl/Ec= Received: by 10.82.174.20 with SMTP id w20mr4802273bue.1188850448600; Mon, 03 Sep 2007 13:14:08 -0700 (PDT) Received: by 10.82.190.14 with HTTP; Mon, 3 Sep 2007 13:14:08 -0700 (PDT) Message-ID: <359a92830709031314l68e8668fj612a2784eebfe4c1@mail.gmail.com> Date: Mon, 3 Sep 2007 16:14:08 -0400 From: "Erick Erickson" To: java-user@lucene.apache.org Subject: Re: Indexing in pieces? In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_15732_25051033.1188850448559" References: <20070831201144.M23585@www.botspiritcompany.com> <6e3ae6310708311430p4a5731c7x68f60c38bf38aaa7@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_15732_25051033.1188850448559 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline See below.. On 8/31/07, Berlin Brown wrote: > > So I am assuming that is not just a matter of "indexing" to that same > directory as you "indexed" before. No, that's all it is. When you open an index, for writing, there is a flag indicating "overwrite or append". So if you can just select new records from your index that aren't already in your index, you can easily just add the new ones. This assumes that each message is a lucene document.. So, based on what you are saying, you would have to reload the > previous index (eg, INDEX_DIR_OLD) and then index the new content. > When I mean "index", I am talking about actually invoking lucene to > merge the content. > > For example, it isnt just a matter of indexing to index_dir_old and > then to index_dir_new and then copying the lucene index files into > another directory index_dir_cur. You don't need to copy that much. Just open the current index and append more records. You can still search the index even as you are adding new documents, although you'll have to close and reopen your *reader* to see the new content. On 8/31/07, Chris Lu wrote: > > I think you can simply change you sql to select only the recently > updated > > messages, and add to your existing index. Although adding to an existing > > > large index also takes a long time, it should be quicker than > re-building > > the whole index. > > > > If your index continues to grow, you may need to have a dedicated server > for > > indexing and searching. > > > > -- > > Chris Lu > > ------------------------- > > Instant Scalable Full-Text Search On Any Database/Application > > site: http://www.dbsight.net > > demo: http://search.dbsight.com > > Lucene Database Search in 3 minutes: > > http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes > > > > > On 8/31/07, bbrown wrote: > > > > > > I have been fine with my database (discussion forum) to lucene. I am > > > taking > > > the simplest approach, eg; I have a discussion forum which are just > text > > > messages, I take those out of the databse and then index the content. > > > > > > I am having troubling because I have hundreds of thousands of messages > and > > > it > > > takes a while, eating my server cpu. I was thinking I would just > index > > > say a > > > portion of the database. For example, index records 1-100 and then > > > 101-200. > > > Can I just index to that index directory without deleting the existing > > > index > > > segment files that are already there? Or is it more complicated than > > > that. > > > > > > -- > > > Berlin Brown > > > [berlin dot brown at gmail dot com] > > > http://botspiritcompany.com/botlist/? > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > > > > > > -- > Berlin Brown > http://www.newspiritcompany.com - newspirit technologies > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > ------=_Part_15732_25051033.1188850448559--