lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Lucene memory usage
Date Thu, 11 Jun 2009 19:50:43 GMT
On Thu, Jun 11, 2009 at 3:21 PM, Jason
Rutherglen<> wrote:
> Makes sense.
> Currently MMapDirectory doesn't write using mapped byte buffers,
> would the memory management of the OS behave differently if we
> were writing to the MMapped bytebuffers as opposed to writing to
> an RAF (like with FSDir)?

I would assume not, but would be good to confirm (Earwin, where's your
improved MMapDir?).

At the page level it's all basic LRU, and LRU is not a good policy for
deciding which search data structures are best kept RAM resident.

>> Well... locality is still important. Under the hood, mmap on a
> page miss must hit the disk.
> Maybe this is where MappedByteBuffer.load as Earwin has
> mantioned comes in handy?

That's unfortunately a rather blunt tool, and only practical when
available RAM exceeds the index size.  Warming your particular
searches is more precise...

Though..... RAM prices are so cheap these days that any "real"
production search deployment should always aim to have the full index
hot, in RAM.  Lucene really should provide a good impl for "RAM only"
indexes.  RAMDirectory/MMapDir are not the answer, since Lucene is
still using postings formats designed for single scan through a file
that resides on disk where bytes consumed on disk are minimized (eg
the VInt format is not CPU friendly).  We should start from
contrib/instantiated and contrib/memory and iterate from there...

> But yeah, we can't do anything with this unless we had a JNI
> library that interacts more directly with the IO system
> (allowing us to configure whether IO is cached etc), which
> perhaps exists or could exist in the future (or Java7?).

Yes please post feature requests to Sun ;)

But I think in the short term Lucene will have to drop to native code
to tell OS not to cache bytes read by segment merging...


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message