lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Varun Thacker <varunthacker1...@gmail.com>
Subject Re: My GSOC proposal
Date Fri, 08 Apr 2011 11:10:34 GMT
I have refined my proposal here : http://goo.gl/uYXrV

Are there any suggestions for which I need to update my proposal before
today's deadline .

On Thu, Apr 7, 2011 at 9:28 AM, Varun Thacker <varunthacker1989@gmail.com>wrote:

> I have updated my proposal online to mention the time I would be able to
> dedicate to the project .
>
>
> On Thu, Apr 7, 2011 at 7:05 AM, Adriano Crestani <
> adrianocrestani@gmail.com> wrote:
>
>> Hi Varun,
>>
>> Nice proposal, very complete. Only one thing missing, you should mention
>> somewhere how many hours a week you are willing to spend working on the
>> project and whether there is any holiday you won't be able to work.
>>
>> Good luck ;)
>>
>>
>> On Wed, Apr 6, 2011 at 5:57 PM, Varun Thacker <varunthacker1989@gmail.com
>> > wrote:
>>
>>> I have drafted the proposal on the official GSoC website . This is the
>>> link to my proposal http://goo.gl/uYXrV . Please do let me know if
>>> anything needs to be changed ,added or removed.
>>>
>>> I will keep on working on it till the deadline on the 8th.
>>>
>>> On Wed, Apr 6, 2011 at 11:41 PM, Michael McCandless <
>>> lucene@mikemccandless.com> wrote:
>>>
>>>> That test code looks good -- you really should have seen awful
>>>> performance had you used O_DIRECT since you read byte by byte.
>>>>
>>>> A more realistic test is to read a whole buffer (eg 4 KB is what
>>>> Lucene now uses during merging, but we'd probably up this to like 1 MB
>>>> when using O_DIRECT).
>>>>
>>>> Linus does hate O_DIRECT (see http://kerneltrap.org/node/7563), and
>>>> for good reason: its existence means projects like ours can use it to
>>>> "work around" limitations in the Linux IO apis that control the buffer
>>>> cache when, otherwise, we might conceivably make patches to fix Linux
>>>> correctly.  It's an escape hatch, and we all use the escape hatch
>>>> instead of trying to fix Linux for real...
>>>>
>>>> For example the NOREUSE flag is a no-op now in Linux, which is a
>>>> shame, because that's precisely the flag we'd want to use for merging
>>>> (along with SEQUENTIAL).  Had that flag been implemented well, it'd
>>>> give better results than our workaround using O_DIRECT.
>>>>
>>>> Anyway, giving how things are, until we can get more control (waaaay
>>>> up in Javaland) over the buffer cache, O_DIRECT (via native directory
>>>> impl through JNI) is our only real option, today.
>>>>
>>>> More details here:
>>>> http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html
>>>>
>>>> Note that other OSs likely do a better job and actually implement
>>>> NOREUSE, and similar APIs, so the generic Unix/WindowsNativeDirectory
>>>> would simply use NOREUSE on these platforms for I/O during segment
>>>> merging.
>>>>
>>>> Mike
>>>>
>>>> http://blog.mikemccandless.com
>>>>
>>>> On Wed, Apr 6, 2011 at 11:56 AM, Varun Thacker
>>>>  <varunthacker1989@gmail.com> wrote:
>>>> > Hi. I wrote a sample code to test out speed difference between
>>>> SEQUENTIAL
>>>> > and O_DIRECT( I used the madvise flag-MADV_DONTNEED) reads .
>>>> >
>>>> > This is the link to the code: http://pastebin.com/8QywKGyS
>>>> >
>>>> > There was a speed difference which when i switched between the two
>>>> flags. I
>>>> > have not used the O_DIRECT flag because Linus had criticized it.
>>>> >
>>>> > Is this what the flags are intended to be used for ? This is just a
>>>> sample
>>>> > code with a test file .
>>>> >
>>>> > On Wed, Apr 6, 2011 at 12:11 PM, Simon Willnauer
>>>> > <simon.willnauer@googlemail.com> wrote:
>>>> >> Hey Varun,
>>>> >> On Tue, Apr 5, 2011 at 11:07 PM, Michael McCandless
>>>> >> <lucene@mikemccandless.com> wrote:
>>>> >>> Hi Varun,
>>>> >>>
>>>> >>> Those two issues would make a great GSoC!  Comments below...
>>>> >> +1
>>>> >>>
>>>> >>> On Tue, Apr 5, 2011 at 1:56 PM, Varun Thacker
>>>> >>> <varunthacker1989@gmail.com> wrote:
>>>> >>>
>>>> >>>> I would like to combine two tasks as part of my project
>>>> >>>> namely-Directory createOutput and openInput should take
an
>>>> IOContext
>>>> >>>> (Lucene-2793) and compliment it by Generalize DirectIOLinuxDir
to
>>>> >>>> UnixDir (Lucene-2795).
>>>> >>>>
>>>> >>>> The first part of the project is aimed at significantly
reducing
>>>> time
>>>> >>>> taken to search during indexing by adding an IOContext which
would
>>>> >>>> store buffer size and have options to bypass the OS’s
buffer cache
>>>> >>>> (This is what causes the slowdown in search ) and other
hints. Once
>>>> >>>> completed I would move on to Lucene-2795 and generalize
the
>>>> Directory
>>>> >>>> implementation to make a UnixDirectory .
>>>> >>>
>>>> >>> So, the first part (LUCENE-2793) should cause no change at all
to
>>>> >>> performance, functionality, etc., because it's "merely" installing
>>>> the
>>>> >>> plumbing (IOContext threaded throughout the low-level store
APIs in
>>>> >>> Lucene) so that higher levels can send important details down
to the
>>>> >>> Directory.  We'd fix IndexWriter/IndexReader to fill out this
>>>> >>> IOContext with the details (merging, flushing, new reader, etc.).
>>>> >>>
>>>> >>> There's some fun/freedom here in figuring out just what details
>>>> should
>>>> >>> be included in IOContext... (eg: is it low level "set buffer
size to
>>>> 4
>>>> >>> KB"
>>>> >>> or is it high level "I am opening a new near-real-time reader").
>>>> >>>
>>>> >>> This first step is a rote cutover, just changing APIs but in
no way
>>>> >>> taking advantage of the new APIs.
>>>> >>>
>>>> >>> The 2nd step (LUCENE-2795) would then take advantage of this
>>>> plumbing,
>>>> >>> by creating a UnixDir impl that, using JNI (C code), passes
advanced
>>>> >>> flags when opening files, based on the incoming IOContext.
>>>> >>>
>>>> >>> The goal is a single UnixDir that has ifdefs so that it's usable
>>>> >>> across multiple Unices, and eg would use direct IO if the context
is
>>>> >>> merging.  If we are ambitious we could rope Windows into the
mix,
>>>> too,
>>>> >>> and then this would be NativeDir...
>>>> >>>
>>>> >>> We can measure success by validating that a big merge while
>>>> searching
>>>> >>> does not hurt search performance?  (Ie we should be able to
>>>> reproduce
>>>> >>> the results from
>>>> >>>
>>>> http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html).
>>>> >>
>>>> >> Thanks for the summary mike!
>>>> >>>
>>>> >>>> I have spoken to Micheal McCandless and Simon Willnauer
about
>>>> >>>> undertaking these tasks. Micheal McCandless has agreed to
mentor me
>>>> .
>>>> >>>> I would love to be able to contribute and learn from Apache
Lucene
>>>> >>>> community this summer. Also I would love suggestions on
how to make
>>>> my
>>>> >>>> application proposal stronger.
>>>> >>>
>>>> >>> I think either Simon or I can be the "official" mentor, and
then the
>>>> >>> other one of us (and other Lucene committers) will support/chime
>>>> >>> in...
>>>> >>
>>>> >> I will take the official responsibility here once we are there!
>>>> >> simon
>>>> >>>
>>>> >>> This is an important change for Lucene!
>>>> >>>
>>>> >>> Mike
>>>> >>>
>>>> >>>
>>>> ---------------------------------------------------------------------
>>>> >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> >>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>> >>>
>>>> >>>
>>>> >>
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> >
>>>> >
>>>> > Regards,
>>>> > Varun Thacker
>>>> > http://varunthacker.wordpress.com
>>>> >
>>>> >
>>>> >
>>>> >
>>>>
>>>
>>>
>>>
>>> --
>>>
>>>
>>> Regards,
>>> Varun Thacker
>>> http://varunthacker.wordpress.com
>>>
>>>
>>>
>>
>
>
> --
>
>
> Regards,
> Varun Thacker
> http://varunthacker.wordpress.com
>
>
>


-- 


Regards,
Varun Thacker
http://varunthacker.wordpress.com

Mime
View raw message