Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CA5DB671B for ; Tue, 14 Jun 2011 07:42:59 +0000 (UTC) Received: (qmail 19402 invoked by uid 500); 14 Jun 2011 07:42:57 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 19347 invoked by uid 500); 14 Jun 2011 07:42:57 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 19339 invoked by uid 99); 14 Jun 2011 07:42:57 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Jun 2011 07:42:57 +0000 X-ASF-Spam-Status: No, hits=0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RFC_ABUSE_POST,SPF_NEUTRAL,UNPARSEABLE_RELAY X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [121.101.151.225] (HELO nm3.bullet.mail.in.yahoo.com) (121.101.151.225) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 14 Jun 2011 07:42:49 +0000 Received: from [121.101.151.238] by nm3.bullet.mail.in.yahoo.com with NNFMP; 14 Jun 2011 07:46:30 -0000 Received: from [121.101.151.232] by tm3.bullet.mail.in.yahoo.com with NNFMP; 14 Jun 2011 07:43:11 -0000 Received: from [127.0.0.1] by omp1001.mail.in.yahoo.com with NNFMP; 14 Jun 2011 07:42:27 -0000 X-Yahoo-Newman-Id: 391183.36112.bm@omp1001.mail.in.yahoo.com Received: (qmail 69452 invoked from network); 14 Jun 2011 07:42:26 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.co.in; h=DKIM-Signature:X-Yahoo-Newman-Property:X-YMail-OSG:X-Yahoo-SMTP:Received:Message-ID:From:To:References:Subject:Date:MIME-Version:Content-Type:Content-Transfer-Encoding:X-Priority:X-MSMail-Priority:X-Mailer:X-MimeOLE; b=37hKzlTKMKVnPBGJEnx7wlLFJCRwp4ljcL9UI5LIKVCRhU0/aAKnfXnwXELDn69Uu69E8A/iHdGScw45NUrOa3TX67EaYBJhlpsZWJiQ734O2WAXkSiTUgt6Urz16WPzFLNIRmpPFLqkINhfmQLCQ0IFfmcrQWMItWjNVoNt0Yw= ; DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.co.in; s=s1024; t=1308037346; bh=qnOmS54Y6sTut3VAfNUm9qQmd1s/gLuMPvFdOZ3GHX0=; h=X-Yahoo-Newman-Property:X-YMail-OSG:X-Yahoo-SMTP:Received:Message-ID:From:To:References:Subject:Date:MIME-Version:Content-Type:Content-Transfer-Encoding:X-Priority:X-MSMail-Priority:X-Mailer:X-MimeOLE; b=Enx9278o/H2/qT/50KESnZRmhePRMvZGlDTkLfqBxuTTb5g2/y3H9cZ3fUBiWMv8jxekNM6rZ4vlDPQbBDfID3l+6gxUJw/83hSezr9CbYFDwfCm1GoHQ6Q5VVPVH6gxttVewv+VXmq7sdeUyuE3leE0aI4vllAYEmdmnEudN1k= X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: pNwni70VM1mSOtQ1aCXKz1JPYLmpduWkVyG80qybD7Rd74G ncffRoa1UEUUTqWpJx1_77NPpwm0KI3Zg0IErEpCUwhF2JE.vLhlTl4e1_aH BZQLaRzkYqr_BCVzdt2kQ_Mx8dPBDBsflFU4Ti00eBfLfeeUp7XOL.QKImaC 0NjzyfViJBdV0v0rvr2ZbhxRecmuYI7E1PUlwWvleO8udiIE.eMBvl2KYJ2W FakiyvuYmHWKLf0zeZItMiLsDEmJIF3u.MF38By83ZCvB8OfdB9EUQP2Qx2d ZUBx14Xz2ElRA90CuyI6A_iRCxr1rhJQo6cmYl8YmNGRayoCf8raEX2M- X-Yahoo-SMTP: JObyHkuswBBrNSLZp.Ycd7Boqpr_GQ-- Received: from GaneshM (emailgane@121.244.159.130 with login) by smtp103.mail.in.yahoo.com with SMTP; 14 Jun 2011 13:12:26 +0530 IST Message-ID: <7F4D8D98DAFB42F3A1CE30E74FCBA50B@sv.us.sonicwall.com> From: "Ganesh" To: References: <4DF3C40F.9090703@code972.com> Subject: Re: Index size and performance degradation Date: Tue, 14 Jun 2011 13:12:27 +0530 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.5931 X-Virus-Checked: Checked by ClamAV on apache.org We tried with more than 50 shards in the single system. Having multiple = small index, indexes and optimizes the content faster. We use = ParallelMultiSearcher to search across the index and the performance is = really good. Now we plan to move to 64 Bit, so that we could use more = RAM. Regards Ganesh=20 =20 ----- Original Message -----=20 From: "Shai Erera" To: Sent: Sunday, June 12, 2011 9:13 AM Subject: Re: Index size and performance degradation >I agree w/ Erick, there is no cutoff point (index size for that matter) > above which you start sharding. >=20 > What you can do is create a scheduled job in your system that runs a = select > list of queries and monitors their performance. Once it degrades, it = shards > the index by either splitting it (you can use IndexSplitter under = contrib) > or create a new shard, and direct new documents to it. >=20 > I think I read somewhere, not sure if it was in Solr or ElasticSearch > documentation, about a Balancer object, which moves shards around in = order > to balance the load on the cluster. You can implement something = similar > which tries to balance the index sizes, creates new shards on-the-fly, = even > merge shards if suddenly a whole source is being removed from the = system > etc. >=20 > Also, note that the 'largest index size' threshold is really a machine > constraint and not Lucene's. So if you decide that 10 GB is your = cutoff, it > is pointless to create 10x10GB shards on the same machine -- searching = them > is just like searching a 100GB index w/ 10x10GB segments. Perhaps it's = even > worse because you consume more RAM when the indexes are split (e.g., = terms > index, field infos etc.). >=20 > Shai >=20 > On Sun, Jun 12, 2011 at 3:10 AM, Erick Erickson = wrote: >=20 >> <<> so testing won't really tell us much>>> >> >> Hmmm, then it's pretty hopeless I think. Problem is that >> anything you say about running on a machine with >> 2G available memory on a single processor is completely >> incomparable to running on a machine with 64G of >> memory available for Lucene and 16 processors. >> >> There's really no such thing as an "optimum" Lucene index >> size, it always relates to the characteristics of the >> underlying hardware. >> >> I think the best you can do is actually test on various >> configurations, then at least you can say "on configuration >> X this is the tipping point". >> >> Sorry there isn't a better answer that I know of, but... >> >> Best >> Erick >> >> On Sat, Jun 11, 2011 at 3:37 PM, Itamar Syn-Hershko = >> wrote: >> > Hi all, >> > >> > I know Lucene indexes to be at their optimum up to a certain size - = said >> to >> > be around several GBs. I haven't found a good discussion over this, = but >> its >> > my understanding that at some point its better to split an index = into >> parts >> > (a la sharding) than to continue searching on a huge-size index. I = assume >> > this has to do with OS and IO configurations. Can anyone point me = to more >> > info on this? >> > >> > We have a product that is using Lucene for various searches, and at = the >> > moment each type of search is using its own Lucene index. We plan = on >> > refactoring the way it works and to combine all indexes into one - = making >> > the whole system more robust and with a smaller memory footprint, = among >> > other things. >> > >> > Assuming the above is true, we are interested in knowing how to do = this >> > correctly. Initially all our indexes will be run in one big index, = but if >> at >> > some index size there is a severe performance degradation we would = like >> to >> > handle that correctly by starting a new FSDirectory index to flush = into, >> or >> > by re-indexing and moving large indexes into their own Lucene = index. >> > >> > Are there are any guidelines for measuring or estimating this = correctly? >> > what we should be aware of while considering all that? We can't = assume >> > anything about the machine running it, so testing won't really tell = us >> > much... >> > >> > Thanks in advance for any input on this, >> > >> > Itamar. >> > >> > >> > = --------------------------------------------------------------------- >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> > For additional commands, e-mail: java-user-help@lucene.apache.org >> > >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org