Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 7077 invoked from network); 30 Jun 2009 21:00:06 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 30 Jun 2009 21:00:06 -0000 Received: (qmail 27262 invoked by uid 500); 30 Jun 2009 21:00:14 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 27208 invoked by uid 500); 30 Jun 2009 21:00:14 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 27198 invoked by uid 99); 30 Jun 2009 21:00:14 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Jun 2009 21:00:14 +0000 X-ASF-Spam-Status: No, hits=3.4 required=10.0 tests=FUZZY_CPILL,HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.218.228] (HELO mail-bw0-f228.google.com) (209.85.218.228) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Jun 2009 21:00:03 +0000 Received: by bwz28 with SMTP id 28so445028bwz.5 for ; Tue, 30 Jun 2009 13:59:40 -0700 (PDT) MIME-Version: 1.0 Received: by 10.223.111.211 with SMTP id t19mr5674480fap.64.1246395580335; Tue, 30 Jun 2009 13:59:40 -0700 (PDT) In-Reply-To: <1246351752.3464.18.camel@pc286> References: <7e536b1f0906261500m297efb0cv107e2b2c5cd94ac3@mail.gmail.com> <7e536b1f0906281413m276606ccyca58036de05708b6@mail.gmail.com> <4A4864E7.3070609@boboco.ie> <7e536b1f0906290047g14322a5bm55f6740090fd32d2@mail.gmail.com> <1246351752.3464.18.camel@pc286> Date: Tue, 30 Jun 2009 22:59:40 +0200 Message-ID: <7e536b1f0906301359g3b7d1259v18987e82466ff48f@mail.gmail.com> Subject: Re: Scaling out/up or a mix From: Marcus Herou To: java-user@lucene.apache.org, te@statsbiblioteket.dk Content-Type: multipart/alternative; boundary=0016e6d37661d0c9c7046d9715d0 X-Virus-Checked: Checked by ClamAV on apache.org --0016e6d37661d0c9c7046d9715d0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Hi. The number of concurrent users today is insignficant but once we push for the service we will get into trouble... I know that since even one simple faceting query (which we will use to display trend graphs) can take forever (talking about SOLR bytw). "Normal" Lucene queries (title:blah OR description:blah) timing is reasonable for the current hardware but not good (Currently 8 machines 2GB RAM each serving 130G index). It takes less than 10 secs at all times which of course is very bad user experience. If someone need to understand more about the nature of this app I think we are quite alike technorati (if we would show all bling-bling) or twingly.com. Basically a blogsearch app. Example of a public query (no sorting on publisheddate but rather on relevance = faster): http://blogsearch.tailsweep.com/search.do?wa=test&la=all And while you are at it, look at our cool BlogSpace: http://blogsearch.tailsweep.com/showFeed.do?feedId=114799 Sorry not meaning to advertise but I could not help it :) //Marcus On Tue, Jun 30, 2009 at 10:49 AM, Toke Eskildsen wrote: > On Mon, 2009-06-29 at 09:47 +0200, Marcus Herou wrote: > > Index size(and growing): 16Gx8 = 128G > > Doc size (data): 20k > > Num docs: 90M > > Num users: Few hundred but most critical is that the admin staff which is > > using the index all day long. > > Query types: Example: title:"Iphone" OR description:"Iphone" sorted by > > publishedDate... = Very simple, no fuzzy searches etc. However since the > > dataset is large it will consume memory on sorting I guess. > > > > Could not one draw any conclusions about best-practice in terms of > hardware > > given the above "specs" ? > > Can you give us an estimate of the number of concurrent searches in > prime time and in what range a satisfactory response time would be? > > Going for a fully RAM-based search on a corpus of this size would mean > that each machine holds about 30GB of index (taken from your hardware > suggestion). I would expect that such a machine would be able to serve > something like 500-1000 searches/second (highly dependent on the index > and the searches, but what you're describing sounds simple enough) if we > just measure the raw search time and lookup of one or two fields for the > first 20 hits. It that what you're aiming for? > > Wrapping in web services and such lowers the number of searches that can > be performed, which makes the RAM-option even more expensive relative to > a harddisk or SSD solution. > > > I mean it is very simple: Let's say someone gives me a budget of 50 000 > USD > > and I then want to get the most bang for the buck for my workload. > > I am a bit unclear on your overall goal. Do you expect the number of > users to grow significantly? > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.herou@tailsweep.com http://www.tailsweep.com/ --0016e6d37661d0c9c7046d9715d0--