lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: fadvise/madvise during segment-merges....
Date Wed, 21 May 2014 13:43:48 GMT
On Wed, May 21, 2014 at 8:20 AM, Ravikumar Govindarajan
<> wrote:
> Great blog and lucid explanation
> I think things have changed in recent kernel versions. I am no expert, but
> could see some code related to this here

That looks promising.  But does that mean SEQUENTIAL will evict the
page once we're done reading it?

> O_DIRECT will be terrible drag no?

Actually O_DIRECT is awesome because it completely bypasses the buffer
cache, so nothing will be evicted.

The downside is you must do your own buffering/read-ahead into
userspace RAM, so you need to be more careful about heap used...

Also, Linus hates this option :)

> Will a battery-backed disk cache help here?

This will make IndexWriter.commit faster, since the IO device will be
able to return from fsync before bytes are actually moved to stable
storage.  But you really shouldn't need to call commit so frequently,
in which case a faster commit is not so important.

> We are using a SortingMergePolicy which most-often hits data randomly. Will
> SEQUENTIAL help here?

Oh hmm then you should NOT call SEQUENTIAL and should not use
O_DIRECT!  In fact, you want the IO pages for merging to enter the
buffer cache....

> Any reasons why you think DONTNEED will be less-useful?

Well, that option is too late?  Like, say I read in the N 1 GB files
to merge, then I call DONTNEED once the merge is done, but by then the
pages for searching have already been evicted.  I could instead call
WONTNEED every few KB of reads/writes but that seems hackish, like
it's a poor emulation of what SEQUENTIAL would express.

But net/net there has been good progress lately, new IO APIs in Java,
improvements to Linux kernel, etc.  There are also sneaky ways to
invoke some of these OS-level APIs without using JNI (the JDK has some
internal APIs).  I think we should explore this area more, to minimize
the cost of merging on ongoing searches.

Mike McCandless

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message