Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (athena.apache.org: local policy)
Subject: Re: Scaling out/up or a mix
From: Toke Eskildsen <te@statsbiblioteket.dk>
Reply-To: te@statsbiblioteket.dk
To: "java-user@lucene.apache.org" <java-user@lucene.apache.org>
In-Reply-To: <7e536b1f0906290047g14322a5bm55f6740090fd32d2@mail.gmail.com>
References: <7e536b1f0906261500m297efb0cv107e2b2c5cd94ac3@mail.gmail.com>
	 <7e536b1f0906281413m276606ccyca58036de05708b6@mail.gmail.com>
	 <4A4864E7.3070609@boboco.ie>
	 <7e536b1f0906290047g14322a5bm55f6740090fd32d2@mail.gmail.com>
Content-Type: text/plain
Organization: Statsbiblioteket
Date: Tue, 30 Jun 2009 10:49:12 +0200
Message-ID: <1246351752.3464.18.camel@pc286>
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit

On Mon, 2009-06-29 at 09:47 +0200, Marcus Herou wrote: 
> Index size(and growing): 16Gx8 = 128G
> Doc size (data): 20k
> Num docs: 90M
> Num users: Few hundred but most critical is that the admin staff which is
> using the index all day long.
> Query types: Example: title:"Iphone" OR description:"Iphone" sorted by
> publishedDate... = Very simple, no fuzzy searches etc. However since the
> dataset is large it will consume memory on sorting I guess.
> 
> Could not one draw any conclusions about best-practice in terms of hardware
> given the above "specs" ?

Can you give us an estimate of the number of concurrent searches in
prime time and in what range a satisfactory response time would be?

Going for a fully RAM-based search on a corpus of this size would mean
that each machine holds about 30GB of index (taken from your hardware
suggestion). I would expect that such a machine would be able to serve
something like 500-1000 searches/second (highly dependent on the index
and the searches, but what you're describing sounds simple enough) if we
just measure the raw search time and lookup of one or two fields for the
first 20 hits. It that what you're aiming for?

Wrapping in web services and such lowers the number of searches that can
be performed, which makes the RAM-option even more expensive relative to
a harddisk or SSD solution.

> I mean it is very simple: Let's say someone gives me a budget of 50 000 USD
> and I then want to get the most bang for the buck for my workload.

I am a bit unclear on your overall goal. Do you expect the number of
users to grow significantly?


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org