lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Willnauer <>
Subject Re: My GSOC proposal
Date Wed, 06 Apr 2011 06:41:23 GMT
Hey Varun,
On Tue, Apr 5, 2011 at 11:07 PM, Michael McCandless
<> wrote:
> Hi Varun,
> Those two issues would make a great GSoC!  Comments below...
> On Tue, Apr 5, 2011 at 1:56 PM, Varun Thacker
> <> wrote:
>> I would like to combine two tasks as part of my project
>> namely-Directory createOutput and openInput should take an IOContext
>> (Lucene-2793) and compliment it by Generalize DirectIOLinuxDir to
>> UnixDir (Lucene-2795).
>> The first part of the project is aimed at significantly reducing time
>> taken to search during indexing by adding an IOContext which would
>> store buffer size and have options to bypass the OS’s buffer cache
>> (This is what causes the slowdown in search ) and other hints. Once
>> completed I would move on to Lucene-2795 and generalize the Directory
>> implementation to make a UnixDirectory .
> So, the first part (LUCENE-2793) should cause no change at all to
> performance, functionality, etc., because it's "merely" installing the
> plumbing (IOContext threaded throughout the low-level store APIs in
> Lucene) so that higher levels can send important details down to the
> Directory.  We'd fix IndexWriter/IndexReader to fill out this
> IOContext with the details (merging, flushing, new reader, etc.).
> There's some fun/freedom here in figuring out just what details should
> be included in IOContext... (eg: is it low level "set buffer size to 4 KB"
> or is it high level "I am opening a new near-real-time reader").
> This first step is a rote cutover, just changing APIs but in no way
> taking advantage of the new APIs.
> The 2nd step (LUCENE-2795) would then take advantage of this plumbing,
> by creating a UnixDir impl that, using JNI (C code), passes advanced
> flags when opening files, based on the incoming IOContext.
> The goal is a single UnixDir that has ifdefs so that it's usable
> across multiple Unices, and eg would use direct IO if the context is
> merging.  If we are ambitious we could rope Windows into the mix, too,
> and then this would be NativeDir...
> We can measure success by validating that a big merge while searching
> does not hurt search performance?  (Ie we should be able to reproduce
> the results from

Thanks for the summary mike!
>> I have spoken to Micheal McCandless and Simon Willnauer about
>> undertaking these tasks. Micheal McCandless has agreed to mentor me .
>> I would love to be able to contribute and learn from Apache Lucene
>> community this summer. Also I would love suggestions on how to make my
>> application proposal stronger.
> I think either Simon or I can be the "official" mentor, and then the
> other one of us (and other Lucene committers) will support/chime
> in...

I will take the official responsibility here once we are there!
> This is an important change for Lucene!
> Mike
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message