lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <>
Subject RE: Scaling out/up or a mix
Date Tue, 30 Jun 2009 09:29:45 GMT
> On Mon, 2009-06-29 at 09:47 +0200, Marcus Herou wrote:
> > Index size(and growing): 16Gx8 = 128G
> > Doc size (data): 20k
> > Num docs: 90M
> > Num users: Few hundred but most critical is that the admin staff which
> is
> > using the index all day long.
> > Query types: Example: title:"Iphone" OR description:"Iphone" sorted by
> > publishedDate... = Very simple, no fuzzy searches etc. However since the
> > dataset is large it will consume memory on sorting I guess.
> >
> > Could not one draw any conclusions about best-practice in terms of
> hardware
> > given the above "specs" ?
> Can you give us an estimate of the number of concurrent searches in
> prime time and in what range a satisfactory response time would be?
> Going for a fully RAM-based search on a corpus of this size would mean
> that each machine holds about 30GB of index (taken from your hardware
> suggestion). I would expect that such a machine would be able to serve
> something like 500-1000 searches/second (highly dependent on the index
> and the searches, but what you're describing sounds simple enough) if we
> just measure the raw search time and lookup of one or two fields for the
> first 20 hits. It that what you're aiming for?
> Wrapping in web services and such lowers the number of searches that can
> be performed, which makes the RAM-option even more expensive relative to
> a harddisk or SSD solution.

I would never say: "I copy an index into a RAMDirectory or something like
that". I would buy enough RAM to fit as most as possible into RAM and (as we
for sure are on a 64bit platform) use MMapDirectory instead of
SimpleFSDirectory or NIOFSDirectory (I am talking with Lucene 2.9 class
names, where FSDirs can be simply instantiated, as you may have noticed).
MMapDirectory uses the index like a swap file that is mapped into address
space. The OS kernel's will then use the index like RAM and map it into real
RAM as needed. We had the discussion a lot of times in this mailing list
(search for MMapDirectory in the archives). So the simple answer is always:
If 64 bit platform with lots of RAM, use MMapDirectory. On Windows this is
still buggy (but with 2.9 there is a workaround in MMapDirectory). When you
warm your searchers before (I think you will do...), the Operating system
kernel will "swap" in as much as possible from the index.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message