Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 42439 invoked from network); 5 Nov 2007 12:14:04 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 5 Nov 2007 12:14:03 -0000 Received: (qmail 56025 invoked by uid 500); 5 Nov 2007 12:13:46 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 55989 invoked by uid 500); 5 Nov 2007 12:13:45 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 55978 invoked by uid 99); 5 Nov 2007 12:13:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Nov 2007 04:13:45 -0800 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [208.97.132.145] (HELO spunkymail-a8.g.dreamhost.com) (208.97.132.145) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Nov 2007 12:14:12 +0000 Received: from [192.168.0.3] (adsl-074-229-189-244.sip.rmo.bellsouth.net [74.229.189.244]) by spunkymail-a8.g.dreamhost.com (Postfix) with ESMTP id 5275D10A227 for ; Mon, 5 Nov 2007 04:13:23 -0800 (PST) Message-Id: From: Grant Ingersoll To: java-user@lucene.apache.org In-Reply-To: <41f0d6590711041048y1e8421edpab755af82436094f@mail.gmail.com> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v912) Subject: Re: How do we limit the growth of a Lucene Index? Date: Mon, 5 Nov 2007 07:13:22 -0500 References: <41f0d6590711041048y1e8421edpab755af82436094f@mail.gmail.com> X-Mailer: Apple Mail (2.912) X-Virus-Checked: Checked by ClamAV on apache.org You could search this list about distributing your indexes, etc. RemoteSearchable may be handy, but you will probably have to build some infrastructure around it for handling failover, etc. (would make for a nice contribution) How often do you think archived data will need to be accessed? And how much data are you talking? Seems to me like the main issue will be in managing the searchers in light of having a lot of potential indexes. Just thinking out loud, though. -Grant On Nov 4, 2007, at 1:48 PM, Sandeep Mahendru wrote: > Hi , > > We have been developing an enterprise logging service at the > Wachovia > bank. The logs (Busines, application, error) for all the bank related > applications are consolidated > at one single location in an Oracle 10g Database. > > In our second phase, we are now building a high perforinmg report > viewer > over it. So our search algorithm does not go to the Oracle 10g DB. We > therfore avoid network and I/O. > Our serach algorith now goes to a LUCENE index. We have Lucene indexes > created for each application. These indexes are present on the same > machine, > where the search algorithm runs. As more applications at the bank > are now > beginning to consume this service, the Lucene Index is now growing. > > One of my team leads has suggested the following approach to resolve > this > issue: > > *I think the best approach is to restrict the Index size , is to > keep it for > some limited time and then archive the same. In case user wants to > search > against the old files then we might need to provide some > configuration using > which the lucene searcher can point to the achieved file and search > the > content. To implement this we need to rename the Index file with > from and to > date before its archived. While searching against the older files, > user need > to provide the date range and then the app can point to the relevant > archived index files for search. Let me know your thoughts on this. * > ** > At present this sounds the most logical to me. But then we begin to > store > the Lucene indexes on a diffferent machine. This might again cause the > search algorithm to make a network trip, if the serach is based on old > archived data. > > Is there a better design to resolve the above concern. Does Lucene > provid > some sort of API to handle the above scenario's? > > Regards, > Sandeep. -------------------------- Grant Ingersoll http://lucene.grantingersoll.com Lucene Boot Camp Training: ApacheCon Atlanta, Nov. 12, 2007. Sign up now! http://www.apachecon.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org