Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 17417 invoked from network); 28 Sep 2006 22:25:57 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 28 Sep 2006 22:25:57 -0000 Received: (qmail 55743 invoked by uid 500); 28 Sep 2006 22:25:51 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 55714 invoked by uid 500); 28 Sep 2006 22:25:51 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 55703 invoked by uid 99); 28 Sep 2006 22:25:51 -0000 Received: from idunn.apache.osuosl.org (HELO idunn.apache.osuosl.org) (140.211.166.84) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Sep 2006 15:25:51 -0700 Authentication-Results: idunn.apache.osuosl.org header.from=chris.lu@gmail.com; domainkeys=good X-ASF-Spam-Status: No, hits=0.5 required=5.0 tests=DNS_FROM_RFC_ABUSE DomainKey-Status: good X-DomainKeys: Ecelerity dk_validate implementing draft-delany-domainkeys-base-01 Received: from [66.249.82.238] ([66.249.82.238:16842] helo=wx-out-0506.google.com) by idunn.apache.osuosl.org (ecelerity 2.1.1.8 r(12930)) with ESMTP id EF/F2-00212-BEB4C154 for ; Thu, 28 Sep 2006 15:25:49 -0700 Received: by wx-out-0506.google.com with SMTP id s15so658648wxc for ; Thu, 28 Sep 2006 15:25:44 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=sHL12MivLUW/YeHc3raBs2MXaA28TAgy2EpzD401By0hOUsTEBBRGL1G7dN0YEA2a+sEzIeWkHb9/MYT6c3SC6cg4QFdVY7gNzyY7B4ZWJ5U0RlAN9bvSL+J1Dr1LNYmlR1IaKarNlPa/WLt2WhwY9jwsJo33jfILuYE5jdHivc= Received: by 10.90.94.2 with SMTP id r2mr1124352agb; Thu, 28 Sep 2006 15:25:44 -0700 (PDT) Received: by 10.90.68.19 with HTTP; Thu, 28 Sep 2006 15:25:44 -0700 (PDT) Message-ID: <6e3ae6310609281525v11c4a688pec29b59d4d227ea4@mail.gmail.com> Date: Thu, 28 Sep 2006 15:25:44 -0700 From: "Chris Lu" To: java-user@lucene.apache.org Subject: Re: Indexing large index with Lucene In-Reply-To: <359a92830609280919o1ec125f6ud7d577c83d138f16@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable Content-Disposition: inline References: <451BD77E.50800@hauk-sasko.de> <359a92830609280919o1ec125f6ud7d577c83d138f16@mail.gmail.com> X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N I like the approach in your second point. But I have doubt on the first poi= nt. For a production level index, usually pretty big, freqent close/reopen the searcher may not be fast enough, especially when you want to cache sorting. It's better to keep the searchers open. But when the indexing process is going on, the files are changing. The searcher's segment information will be outdated and read EOF exceptions will happen. So for a big index, it's better to keep two copies of index, one for searching, one for indexing. And hot-swapping them when indexing is done. This is what we did in DBSight. No read EOF exceptions or corrupted indexes any more. Chris Lu --------------------------- Full-Text Search on Any Applications/Databases http://www.dbsight.net On 9/28/06, Erick Erickson wrote: > Two things come to mind... > > First, you can freely write to an index while searching it, the search is > always available. I'm pretty sure this includes deleting/readding documen= ts. > However, you won't be able to search on the changes in your index until y= ou > close/reopen the *searcher*. > > Second, depending on how quickly you need updates, you could always make = a > *copy* of your index, update that and then move it back to where your > searcher looks for it, sort of a batch process really. It all depends upo= n > how quickly you require seeing the changes. > > Hope this helps > Erick > > On 9/28/06, Eric Louvard wrote: > > > > I'm using Lucene since several year. We had to index allways more > > documents. > > > > I'm now trying to optimise the index process with more than 1.000.000 > > documents and I can see that the performance will decrease when the > > index size is greater. > > I would like to know if someone as allready studied this case. > > > > It's interactively maintained index and the fisrt index process is my > > biggest Problem. > > > > - A document contains several attributs. > > - I can't block the index during the index process (the search must > > allways be availlable). > > - I need to delete the older version of document if I become an newer. > > > > Thank you to tell me about you personnal experience. > > > > =C9ric Louvard. > > > > -- > > Mit freundlichen Gr=FC=DFen > > > > i. A. =C9ric Louvard > > HAUK & SASKO Ingenieurgesellschaft mbH > > Zettachring 2 > > D-70567 Stuttgart > > > > Phone: +49 7 11 7 25 89 - 19 > > Fax: +49 7 11 7 25 89 - 50 > > E-Mail: eric.louvard@hauk-sasko.de > > www: www.hauk-sasko.de > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org