lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcus Herou <>
Subject Re: Scaling out/up or a mix
Date Tue, 30 Jun 2009 21:07:46 GMT
Hi, like the sound of this.

What I am not familiar with in terms of Lucene is how the index get's
swapped in and out of memory. When it comes to database tables (non
partitionable tables at least) I know that one should have enough memory to
fit the entire index into memory to avoid file-sorts for instance which of
course is slow.

If one operates mostly on the newest added entries will those always be
mapped into a cache or something.

How should I think ?

Hmmm... I think now that I need to create a rotation-partition scheme where
I separate cold (old entries) and warm entries...


On Tue, Jun 30, 2009 at 11:29 AM, Uwe Schindler <>wrote:

> > On Mon, 2009-06-29 at 09:47 +0200, Marcus Herou wrote:
> > > Index size(and growing): 16Gx8 = 128G
> > > Doc size (data): 20k
> > > Num docs: 90M
> > > Num users: Few hundred but most critical is that the admin staff which
> > is
> > > using the index all day long.
> > > Query types: Example: title:"Iphone" OR description:"Iphone" sorted by
> > > publishedDate... = Very simple, no fuzzy searches etc. However since
> the
> > > dataset is large it will consume memory on sorting I guess.
> > >
> > > Could not one draw any conclusions about best-practice in terms of
> > hardware
> > > given the above "specs" ?
> >
> > Can you give us an estimate of the number of concurrent searches in
> > prime time and in what range a satisfactory response time would be?
> >
> > Going for a fully RAM-based search on a corpus of this size would mean
> > that each machine holds about 30GB of index (taken from your hardware
> > suggestion). I would expect that such a machine would be able to serve
> > something like 500-1000 searches/second (highly dependent on the index
> > and the searches, but what you're describing sounds simple enough) if we
> > just measure the raw search time and lookup of one or two fields for the
> > first 20 hits. It that what you're aiming for?
> >
> > Wrapping in web services and such lowers the number of searches that can
> > be performed, which makes the RAM-option even more expensive relative to
> > a harddisk or SSD solution.
> I would never say: "I copy an index into a RAMDirectory or something like
> that". I would buy enough RAM to fit as most as possible into RAM and (as
> we
> for sure are on a 64bit platform) use MMapDirectory instead of
> SimpleFSDirectory or NIOFSDirectory (I am talking with Lucene 2.9 class
> names, where FSDirs can be simply instantiated, as you may have noticed).
> MMapDirectory uses the index like a swap file that is mapped into address
> space. The OS kernel's will then use the index like RAM and map it into
> real
> RAM as needed. We had the discussion a lot of times in this mailing list
> (search for MMapDirectory in the archives). So the simple answer is always:
> If 64 bit platform with lots of RAM, use MMapDirectory. On Windows this is
> still buggy (but with 2.9 there is a workaround in MMapDirectory). When you
> warm your searchers before (I think you will do...), the Operating system
> kernel will "swap" in as much as possible from the index.
> Uwe
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Marcus Herou CTO and co-founder Tailsweep AB

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message