Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 3733 invoked from network); 20 Sep 2007 15:55:45 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 20 Sep 2007 15:55:45 -0000 Received: (qmail 13709 invoked by uid 500); 20 Sep 2007 15:55:30 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 13675 invoked by uid 500); 20 Sep 2007 15:55:30 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 13664 invoked by uid 99); 20 Sep 2007 15:55:30 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Sep 2007 08:55:30 -0700 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [208.97.132.83] (HELO spunkymail-a11.g.dreamhost.com) (208.97.132.83) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Sep 2007 15:57:27 +0000 Received: from [192.168.0.3] (adsl-074-229-189-244.sip.rmo.bellsouth.net [74.229.189.244]) by spunkymail-a11.g.dreamhost.com (Postfix) with ESMTP id 2AE53B867C for ; Thu, 20 Sep 2007 08:55:03 -0700 (PDT) Mime-Version: 1.0 (Apple Message framework v752.3) In-Reply-To: <103686.46633.qm@web30702.mail.mud.yahoo.com> References: <103686.46633.qm@web30702.mail.mud.yahoo.com> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: Grant Ingersoll Subject: Re: Multiple Indices vs Single Index Date: Thu, 20 Sep 2007 11:55:01 -0400 To: java-user@lucene.apache.org X-Mailer: Apple Mail (2.752.3) X-Virus-Checked: Checked by ClamAV on apache.org OK, I thought you meant your index would have in it the name of the second index and would thus do a two-stage retrieval. At any rate, if you are saying your combined index with all the stored fields is ~3.4 GB I would think it would fit reasonably on the machine you have and perform reasonably. Naturally, this depends on your application, your users, etc. and I can't make any guarantees, but I certainly recall others managing this size just fine. See the many tips on improving searching and indexing on the Wiki (link at bottom in my signature) and do some profiling/testing. When you said your tests were inconclusive, what tests have you done? If you can, run the tests in a profiler to see where your bottlenecks are. -Grant On Sep 20, 2007, at 11:16 AM, Nikhil Chhaochharia wrote: > I am sorry, it seems that I was not clear with what my problem is. > I will try to describe it again. > > My data is divided into 40 categories and at one time only one > category can be searched. The GUI for the system will ask the user > to select the category from a drop-down. Currently, I have a > separate index for every category. The index sizes varies - one > category index is 10MB and another is 700MB. Other index-sizes are > somewhere in between. > > I was wondering if it will be better to just have 1 large index > with all the 40 indices combined. I do not need to do dual-queries > and my total index size (if I create a single index) is about > 3.4GB. It will increase to maximum of 5-6 GB. I am running this > on a dedicated machine with 8GB RAM. > > Unfortunately I do not have enough hardware to run both in parallel > and test properly. Have just one server which is being used by > live users. So it would be great if you could tell me whether I > should stick with my 40 indices or combine them into 1 index. What > are the pros and cons of each approach ? > > Thanks, > Nikhil > > > ----- Original Message ---- > From: Grant Ingersoll > To: java-user@lucene.apache.org > Sent: Thursday, 20 September, 2007 7:57:21 PM > Subject: Re: Multiple Indices vs Single Index > > If I understand correctly, you want to do a two stage retrieval > right? That is, look up in the initial index (3.4 GB) and then do a > second search on the sub index? Presumably, you have to manage the > Searchers, etc. for each of the sub-indexes as well as the big > index. This means you have to go through the hits from the first > search, then route, etc. correct? > > Have you tried creating one single index with all the (stored) > fields, etc? Worst case scenario, assuming 1GB per index, is you > would have a 40GB index, but my guess is index compression will > reduce it more. Since you are less than that anyway, have you tried > just the straightforward solution? Or do you have other requirements > that force the sub-index solution? Also, I am not sure it will work, > but it seems worth a try. Of course, this also depends on how much > you expect your indexes to grow. > > Also, what was inconclusive about your tests? Maybe you can describe > more what you have tried to date? > > Cheers, > Grant > > On Sep 20, 2007, at 3:50 AM, Nikhil Chhaochharia wrote: > >> Hi, >> >> I have about 40 indices which range in size from 10MB to 700MB. >> There are quite a few stored fields. To get an idea of the >> document size, I have about 400k documents in the 700MB index. >> >> Depending on the query, I choose the index which needs to be >> searched. Each query hits only one index. I was wondering if >> creating a single index where every document will have the >> indexname as a field will be more efficient. I created such an >> index and it was 3.4 GB in size. My initial performance tests with >> it are not conclusive. >> >> Also, what are the other points to be addressed while deciding >> between 1 index and 40 indices. >> >> I have 8GB RAM on the machine. >> >> >> Thanks, >> Nikhil >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> > > -------------------------- > Grant Ingersoll > http://lucene.grantingersoll.com > > Lucene Helpful Hints: > http://wiki.apache.org/lucene-java/BasicsOfPerformance > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > -------------------------- Grant Ingersoll http://lucene.grantingersoll.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org