From java-user-return-50016-apmail-lucene-java-user-archive=lucene.apache.org@lucene.apache.org Tue Jun 14 08:04:05 2011 Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A879C63E8 for ; Tue, 14 Jun 2011 08:04:05 +0000 (UTC) Received: (qmail 43276 invoked by uid 500); 14 Jun 2011 08:04:03 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 43225 invoked by uid 500); 14 Jun 2011 08:04:03 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 43217 invoked by uid 99); 14 Jun 2011 08:04:03 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Jun 2011 08:04:03 +0000 X-ASF-Spam-Status: No, hits=0.7 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [188.121.53.1] (HELO n1plout04-01.prod.ams1.secureserver.net) (188.121.53.1) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 14 Jun 2011 08:03:56 +0000 Received: (qmail 15567 invoked from network); 14 Jun 2011 08:03:34 -0000 Received: from unknown (79.181.4.147) by n1plout04-01.prod.ams1.secureserver.net (188.121.53.1) with ESMTP; 14 Jun 2011 08:03:17 -0000 Message-ID: <4DF715C3.9000209@code972.com> Date: Tue, 14 Jun 2011 11:03:15 +0300 From: Itamar Syn-Hershko User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.17) Gecko/20110414 Lightning/1.0b2 Thunderbird/3.1.10 MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: Index size and performance degradation References: <4DF3C40F.9090703@code972.com> <4DF47461.1050004@code972.com> <1308036518.853.79.camel@te-prime> In-Reply-To: <1308036518.853.79.camel@te-prime> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Thanks. Our product is pretty generic and we can't assume much on the hardware, as well as on usage. Some users would want low latency, others will prefer throughput. My job is to make as little compromise as possible... As for SSD, thats generally a good advice, except they seem to be failing quite a lot. For example see: http://www.codinghorror.com/blog/2011/05/the-hot-crazy-solid-state-drive-scale.html On 14/06/2011 10:28, Toke Eskildsen wrote: > On Sun, 2011-06-12 at 10:10 +0200, Itamar Syn-Hershko wrote: >> The whole point of my question was to find out if and how to make >> balancing on the SAME machine. Apparently thats not going to help and at >> a certain point we will just have to prompt the user to buy more hardware... > It really depends on your scenario. If you have few concurrent requests > and are looking to minimize latency, sharding might help; assuming you > have fast IO and multiple cores. You basically want to saturate all > available resources for all requests. > > On the other hand, if throughput is the issue, sharding on a single > machine is counter-productive due to increased duplication and merging. > >> Out of curiosity, isn't there anything that we can do to avoid that? for >> instance using memory-mapped files for the indexes? anything that would >> help us overcome OS limitations of that sort... > One standard advice for speeding up searches is using SSD's. Our > (admittedly old) experiments puts SSD-performance near RAM. With the > prices we have now, SSD's seems like an obvious choice for most setups. > > We tried a few performance tests at different index sizes and for us, > index size vs. performance looked like the power law: Heavy performance > degradation in the beginning, less later. It makes sense when we look at > caching and it means that if you do not require stellar performance, you > can have very large indexes on few machines (cue Hathi Trust). > > - Toke Eskildsen > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org