lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adriano Crestani <adrianocrest...@gmail.com>
Subject Re: My GSOC proposal
Date Thu, 07 Apr 2011 01:35:30 GMT
Hi Varun,

Nice proposal, very complete. Only one thing missing, you should mention
somewhere how many hours a week you are willing to spend working on the
project and whether there is any holiday you won't be able to work.

Good luck ;)

On Wed, Apr 6, 2011 at 5:57 PM, Varun Thacker <varunthacker1989@gmail.com>wrote:

> I have drafted the proposal on the official GSoC website . This is the link
> to my proposal http://goo.gl/uYXrV . Please do let me know if anything
> needs to be changed ,added or removed.
>
> I will keep on working on it till the deadline on the 8th.
>
> On Wed, Apr 6, 2011 at 11:41 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> That test code looks good -- you really should have seen awful
>> performance had you used O_DIRECT since you read byte by byte.
>>
>> A more realistic test is to read a whole buffer (eg 4 KB is what
>> Lucene now uses during merging, but we'd probably up this to like 1 MB
>> when using O_DIRECT).
>>
>> Linus does hate O_DIRECT (see http://kerneltrap.org/node/7563), and
>> for good reason: its existence means projects like ours can use it to
>> "work around" limitations in the Linux IO apis that control the buffer
>> cache when, otherwise, we might conceivably make patches to fix Linux
>> correctly.  It's an escape hatch, and we all use the escape hatch
>> instead of trying to fix Linux for real...
>>
>> For example the NOREUSE flag is a no-op now in Linux, which is a
>> shame, because that's precisely the flag we'd want to use for merging
>> (along with SEQUENTIAL).  Had that flag been implemented well, it'd
>> give better results than our workaround using O_DIRECT.
>>
>> Anyway, giving how things are, until we can get more control (waaaay
>> up in Javaland) over the buffer cache, O_DIRECT (via native directory
>> impl through JNI) is our only real option, today.
>>
>> More details here:
>> http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html
>>
>> Note that other OSs likely do a better job and actually implement
>> NOREUSE, and similar APIs, so the generic Unix/WindowsNativeDirectory
>> would simply use NOREUSE on these platforms for I/O during segment
>> merging.
>>
>> Mike
>>
>> http://blog.mikemccandless.com
>>
>> On Wed, Apr 6, 2011 at 11:56 AM, Varun Thacker
>> <varunthacker1989@gmail.com> wrote:
>> > Hi. I wrote a sample code to test out speed difference between
>> SEQUENTIAL
>> > and O_DIRECT( I used the madvise flag-MADV_DONTNEED) reads .
>> >
>> > This is the link to the code: http://pastebin.com/8QywKGyS
>> >
>> > There was a speed difference which when i switched between the two
>> flags. I
>> > have not used the O_DIRECT flag because Linus had criticized it.
>> >
>> > Is this what the flags are intended to be used for ? This is just a
>> sample
>> > code with a test file .
>> >
>> > On Wed, Apr 6, 2011 at 12:11 PM, Simon Willnauer
>> > <simon.willnauer@googlemail.com> wrote:
>> >> Hey Varun,
>> >> On Tue, Apr 5, 2011 at 11:07 PM, Michael McCandless
>> >> <lucene@mikemccandless.com> wrote:
>> >>> Hi Varun,
>> >>>
>> >>> Those two issues would make a great GSoC!  Comments below...
>> >> +1
>> >>>
>> >>> On Tue, Apr 5, 2011 at 1:56 PM, Varun Thacker
>> >>> <varunthacker1989@gmail.com> wrote:
>> >>>
>> >>>> I would like to combine two tasks as part of my project
>> >>>> namely-Directory createOutput and openInput should take an IOContext
>> >>>> (Lucene-2793) and compliment it by Generalize DirectIOLinuxDir to
>> >>>> UnixDir (Lucene-2795).
>> >>>>
>> >>>> The first part of the project is aimed at significantly reducing
time
>> >>>> taken to search during indexing by adding an IOContext which would
>> >>>> store buffer size and have options to bypass the OS’s buffer cache
>> >>>> (This is what causes the slowdown in search ) and other hints. Once
>> >>>> completed I would move on to Lucene-2795 and generalize the Directory
>> >>>> implementation to make a UnixDirectory .
>> >>>
>> >>> So, the first part (LUCENE-2793) should cause no change at all to
>> >>> performance, functionality, etc., because it's "merely" installing the
>> >>> plumbing (IOContext threaded throughout the low-level store APIs in
>> >>> Lucene) so that higher levels can send important details down to the
>> >>> Directory.  We'd fix IndexWriter/IndexReader to fill out this
>> >>> IOContext with the details (merging, flushing, new reader, etc.).
>> >>>
>> >>> There's some fun/freedom here in figuring out just what details should
>> >>> be included in IOContext... (eg: is it low level "set buffer size to
4
>> >>> KB"
>> >>> or is it high level "I am opening a new near-real-time reader").
>> >>>
>> >>> This first step is a rote cutover, just changing APIs but in no way
>> >>> taking advantage of the new APIs.
>> >>>
>> >>> The 2nd step (LUCENE-2795) would then take advantage of this plumbing,
>> >>> by creating a UnixDir impl that, using JNI (C code), passes advanced
>> >>> flags when opening files, based on the incoming IOContext.
>> >>>
>> >>> The goal is a single UnixDir that has ifdefs so that it's usable
>> >>> across multiple Unices, and eg would use direct IO if the context is
>> >>> merging.  If we are ambitious we could rope Windows into the mix, too,
>> >>> and then this would be NativeDir...
>> >>>
>> >>> We can measure success by validating that a big merge while searching
>> >>> does not hurt search performance?  (Ie we should be able to reproduce
>> >>> the results from
>> >>> http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html
>> ).
>> >>
>> >> Thanks for the summary mike!
>> >>>
>> >>>> I have spoken to Micheal McCandless and Simon Willnauer about
>> >>>> undertaking these tasks. Micheal McCandless has agreed to mentor
me .
>> >>>> I would love to be able to contribute and learn from Apache Lucene
>> >>>> community this summer. Also I would love suggestions on how to make
>> my
>> >>>> application proposal stronger.
>> >>>
>> >>> I think either Simon or I can be the "official" mentor, and then the
>> >>> other one of us (and other Lucene committers) will support/chime
>> >>> in...
>> >>
>> >> I will take the official responsibility here once we are there!
>> >> simon
>> >>>
>> >>> This is an important change for Lucene!
>> >>>
>> >>> Mike
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> >>> For additional commands, e-mail: dev-help@lucene.apache.org
>> >>>
>> >>>
>> >>
>> >
>> >
>> >
>> > --
>> >
>> >
>> > Regards,
>> > Varun Thacker
>> > http://varunthacker.wordpress.com
>> >
>> >
>> >
>> >
>>
>
>
>
> --
>
>
> Regards,
> Varun Thacker
> http://varunthacker.wordpress.com
>
>
>

Mime
View raw message