lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Rutherglen <jason.rutherg...@gmail.com>
Subject Re: Lucene memory usage
Date Thu, 11 Jun 2009 20:30:57 GMT
> Yes please post feature requests to Sun ;)

I signed up for
http://mail.openjdk.java.net/mailman/listinfo/nio-discuss

> But I think in the short term Lucene will have to drop to
native code to tell OS not to cache bytes read by segment
merging...

LUCENE-1121 uses transferTo which presumably doesn't run bytes
through the IO cache? Granted it's slower on most platforms, but
could this be fixed in future Java releases?

On Thu, Jun 11, 2009 at 12:50 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> On Thu, Jun 11, 2009 at 3:21 PM, Jason
> Rutherglen<jason.rutherglen@gmail.com> wrote:
> > Makes sense.
> >
> > Currently MMapDirectory doesn't write using mapped byte buffers,
> > would the memory management of the OS behave differently if we
> > were writing to the MMapped bytebuffers as opposed to writing to
> > an RAF (like with FSDir)?
>
> I would assume not, but would be good to confirm (Earwin, where's your
> improved MMapDir?).
>
> At the page level it's all basic LRU, and LRU is not a good policy for
> deciding which search data structures are best kept RAM resident.
>
> >> Well... locality is still important. Under the hood, mmap on a
> > page miss must hit the disk.
> >
> > Maybe this is where MappedByteBuffer.load as Earwin has
> > mantioned comes in handy?
>
> That's unfortunately a rather blunt tool, and only practical when
> available RAM exceeds the index size.  Warming your particular
> searches is more precise...
>
> Though..... RAM prices are so cheap these days that any "real"
> production search deployment should always aim to have the full index
> hot, in RAM.  Lucene really should provide a good impl for "RAM only"
> indexes.  RAMDirectory/MMapDir are not the answer, since Lucene is
> still using postings formats designed for single scan through a file
> that resides on disk where bytes consumed on disk are minimized (eg
> the VInt format is not CPU friendly).  We should start from
> contrib/instantiated and contrib/memory and iterate from there...
>
> > But yeah, we can't do anything with this unless we had a JNI
> > library that interacts more directly with the IO system
> > (allowing us to configure whether IO is cached etc), which
> > perhaps exists or could exist in the future (or Java7?).
>
> Yes please post feature requests to Sun ;)
>
> But I think in the short term Lucene will have to drop to native code
> to tell OS not to cache bytes read by segment merging...
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Mime
View raw message