lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Varun Thacker <varunthacker1...@gmail.com>
Subject Re: My GSOC proposal
Date Thu, 07 Apr 2011 03:58:35 GMT
I have updated my proposal online to mention the time I would be able to
dedicate to the project .

On Thu, Apr 7, 2011 at 7:05 AM, Adriano Crestani
<adrianocrestani@gmail.com>wrote:

> Hi Varun,
>
> Nice proposal, very complete. Only one thing missing, you should mention
> somewhere how many hours a week you are willing to spend working on the
> project and whether there is any holiday you won't be able to work.
>
> Good luck ;)
>
>
> On Wed, Apr 6, 2011 at 5:57 PM, Varun Thacker <varunthacker1989@gmail.com>wrote:
>
>> I have drafted the proposal on the official GSoC website . This is the
>> link to my proposal http://goo.gl/uYXrV . Please do let me know if
>> anything needs to be changed ,added or removed.
>>
>> I will keep on working on it till the deadline on the 8th.
>>
>> On Wed, Apr 6, 2011 at 11:41 PM, Michael McCandless <
>> lucene@mikemccandless.com> wrote:
>>
>>> That test code looks good -- you really should have seen awful
>>> performance had you used O_DIRECT since you read byte by byte.
>>>
>>> A more realistic test is to read a whole buffer (eg 4 KB is what
>>> Lucene now uses during merging, but we'd probably up this to like 1 MB
>>> when using O_DIRECT).
>>>
>>> Linus does hate O_DIRECT (see http://kerneltrap.org/node/7563), and
>>> for good reason: its existence means projects like ours can use it to
>>> "work around" limitations in the Linux IO apis that control the buffer
>>> cache when, otherwise, we might conceivably make patches to fix Linux
>>> correctly.  It's an escape hatch, and we all use the escape hatch
>>> instead of trying to fix Linux for real...
>>>
>>> For example the NOREUSE flag is a no-op now in Linux, which is a
>>> shame, because that's precisely the flag we'd want to use for merging
>>> (along with SEQUENTIAL).  Had that flag been implemented well, it'd
>>> give better results than our workaround using O_DIRECT.
>>>
>>> Anyway, giving how things are, until we can get more control (waaaay
>>> up in Javaland) over the buffer cache, O_DIRECT (via native directory
>>> impl through JNI) is our only real option, today.
>>>
>>> More details here:
>>> http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html
>>>
>>> Note that other OSs likely do a better job and actually implement
>>> NOREUSE, and similar APIs, so the generic Unix/WindowsNativeDirectory
>>> would simply use NOREUSE on these platforms for I/O during segment
>>> merging.
>>>
>>> Mike
>>>
>>> http://blog.mikemccandless.com
>>>
>>> On Wed, Apr 6, 2011 at 11:56 AM, Varun Thacker
>>>  <varunthacker1989@gmail.com> wrote:
>>> > Hi. I wrote a sample code to test out speed difference between
>>> SEQUENTIAL
>>> > and O_DIRECT( I used the madvise flag-MADV_DONTNEED) reads .
>>> >
>>> > This is the link to the code: http://pastebin.com/8QywKGyS
>>> >
>>> > There was a speed difference which when i switched between the two
>>> flags. I
>>> > have not used the O_DIRECT flag because Linus had criticized it.
>>> >
>>> > Is this what the flags are intended to be used for ? This is just a
>>> sample
>>> > code with a test file .
>>> >
>>> > On Wed, Apr 6, 2011 at 12:11 PM, Simon Willnauer
>>> > <simon.willnauer@googlemail.com> wrote:
>>> >> Hey Varun,
>>> >> On Tue, Apr 5, 2011 at 11:07 PM, Michael McCandless
>>> >> <lucene@mikemccandless.com> wrote:
>>> >>> Hi Varun,
>>> >>>
>>> >>> Those two issues would make a great GSoC!  Comments below...
>>> >> +1
>>> >>>
>>> >>> On Tue, Apr 5, 2011 at 1:56 PM, Varun Thacker
>>> >>> <varunthacker1989@gmail.com> wrote:
>>> >>>
>>> >>>> I would like to combine two tasks as part of my project
>>> >>>> namely-Directory createOutput and openInput should take an IOContext
>>> >>>> (Lucene-2793) and compliment it by Generalize DirectIOLinuxDir
to
>>> >>>> UnixDir (Lucene-2795).
>>> >>>>
>>> >>>> The first part of the project is aimed at significantly reducing
>>> time
>>> >>>> taken to search during indexing by adding an IOContext which
would
>>> >>>> store buffer size and have options to bypass the OS’s buffer
cache
>>> >>>> (This is what causes the slowdown in search ) and other hints.
Once
>>> >>>> completed I would move on to Lucene-2795 and generalize the
>>> Directory
>>> >>>> implementation to make a UnixDirectory .
>>> >>>
>>> >>> So, the first part (LUCENE-2793) should cause no change at all to
>>> >>> performance, functionality, etc., because it's "merely" installing
>>> the
>>> >>> plumbing (IOContext threaded throughout the low-level store APIs
in
>>> >>> Lucene) so that higher levels can send important details down to
the
>>> >>> Directory.  We'd fix IndexWriter/IndexReader to fill out this
>>> >>> IOContext with the details (merging, flushing, new reader, etc.).
>>> >>>
>>> >>> There's some fun/freedom here in figuring out just what details
>>> should
>>> >>> be included in IOContext... (eg: is it low level "set buffer size
to
>>> 4
>>> >>> KB"
>>> >>> or is it high level "I am opening a new near-real-time reader").
>>> >>>
>>> >>> This first step is a rote cutover, just changing APIs but in no
way
>>> >>> taking advantage of the new APIs.
>>> >>>
>>> >>> The 2nd step (LUCENE-2795) would then take advantage of this
>>> plumbing,
>>> >>> by creating a UnixDir impl that, using JNI (C code), passes advanced
>>> >>> flags when opening files, based on the incoming IOContext.
>>> >>>
>>> >>> The goal is a single UnixDir that has ifdefs so that it's usable
>>> >>> across multiple Unices, and eg would use direct IO if the context
is
>>> >>> merging.  If we are ambitious we could rope Windows into the mix,
>>> too,
>>> >>> and then this would be NativeDir...
>>> >>>
>>> >>> We can measure success by validating that a big merge while searching
>>> >>> does not hurt search performance?  (Ie we should be able to reproduce
>>> >>> the results from
>>> >>>
>>> http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html).
>>> >>
>>> >> Thanks for the summary mike!
>>> >>>
>>> >>>> I have spoken to Micheal McCandless and Simon Willnauer about
>>> >>>> undertaking these tasks. Micheal McCandless has agreed to mentor
me
>>> .
>>> >>>> I would love to be able to contribute and learn from Apache
Lucene
>>> >>>> community this summer. Also I would love suggestions on how
to make
>>> my
>>> >>>> application proposal stronger.
>>> >>>
>>> >>> I think either Simon or I can be the "official" mentor, and then
the
>>> >>> other one of us (and other Lucene committers) will support/chime
>>> >>> in...
>>> >>
>>> >> I will take the official responsibility here once we are there!
>>> >> simon
>>> >>>
>>> >>> This is an important change for Lucene!
>>> >>>
>>> >>> Mike
>>> >>>
>>> >>> ---------------------------------------------------------------------
>>> >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> >>> For additional commands, e-mail: dev-help@lucene.apache.org
>>> >>>
>>> >>>
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> >
>>> >
>>> > Regards,
>>> > Varun Thacker
>>> > http://varunthacker.wordpress.com
>>> >
>>> >
>>> >
>>> >
>>>
>>
>>
>>
>> --
>>
>>
>> Regards,
>> Varun Thacker
>> http://varunthacker.wordpress.com
>>
>>
>>
>


-- 


Regards,
Varun Thacker
http://varunthacker.wordpress.com

Mime
View raw message