lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: My GSOC proposal
Date Tue, 05 Apr 2011 21:07:03 GMT
Hi Varun,

Those two issues would make a great GSoC!  Comments below...

On Tue, Apr 5, 2011 at 1:56 PM, Varun Thacker
<> wrote:

> I would like to combine two tasks as part of my project
> namely-Directory createOutput and openInput should take an IOContext
> (Lucene-2793) and compliment it by Generalize DirectIOLinuxDir to
> UnixDir (Lucene-2795).
> The first part of the project is aimed at significantly reducing time
> taken to search during indexing by adding an IOContext which would
> store buffer size and have options to bypass the OS’s buffer cache
> (This is what causes the slowdown in search ) and other hints. Once
> completed I would move on to Lucene-2795 and generalize the Directory
> implementation to make a UnixDirectory .

So, the first part (LUCENE-2793) should cause no change at all to
performance, functionality, etc., because it's "merely" installing the
plumbing (IOContext threaded throughout the low-level store APIs in
Lucene) so that higher levels can send important details down to the
Directory.  We'd fix IndexWriter/IndexReader to fill out this
IOContext with the details (merging, flushing, new reader, etc.).

There's some fun/freedom here in figuring out just what details should
be included in IOContext... (eg: is it low level "set buffer size to 4 KB"
or is it high level "I am opening a new near-real-time reader").

This first step is a rote cutover, just changing APIs but in no way
taking advantage of the new APIs.

The 2nd step (LUCENE-2795) would then take advantage of this plumbing,
by creating a UnixDir impl that, using JNI (C code), passes advanced
flags when opening files, based on the incoming IOContext.

The goal is a single UnixDir that has ifdefs so that it's usable
across multiple Unices, and eg would use direct IO if the context is
merging.  If we are ambitious we could rope Windows into the mix, too,
and then this would be NativeDir...

We can measure success by validating that a big merge while searching
does not hurt search performance?  (Ie we should be able to reproduce
the results from

> I have spoken to Micheal McCandless and Simon Willnauer about
> undertaking these tasks. Micheal McCandless has agreed to mentor me .
> I would love to be able to contribute and learn from Apache Lucene
> community this summer. Also I would love suggestions on how to make my
> application proposal stronger.

I think either Simon or I can be the "official" mentor, and then the
other one of us (and other Lucene committers) will support/chime

This is an important change for Lucene!


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message