directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Selcuk AYA <ayasel...@gmail.com>
Subject Re: [Txn Layer] WAL flush questions
Date Mon, 19 Mar 2012 17:27:51 GMT
On Mon, Mar 19, 2012 at 10:26 AM, Selcuk AYA <ayaselcuk@gmail.com> wrote:
> On Mon, Mar 19, 2012 at 9:24 AM, Emmanuel Lécharny <elecharny@gmail.com> wrote:
>> Hi,
>>
>> I have a few questions about the handling of the log buffer.
>>
>> When we can't write anymore data in the buffer, because it's full, we try to
>> flush the buffer on disk. What happens then is :
>> - if there is enough room remaining in the buffer, we write a skip record
>> (with a -1 length) : is it necessary ? (we then rewind the buffer)
>> - otherwise, we rewind the buffer
>>
>> In any case, we increment the writeAheadRewindCount : what for ?

as far as I can remember, writeAheadRewindCount was to avoid
overwriting non flushed log records when in memory circular buffer
wraps. IF this answer is not good enough, I can take a look more
closely later.

>>
>> then we call the flush() method, which will be executed only if there is no
>> other thread flushing the buffer already (just in case the sync() method is
>> called by another thread). I guess this is intended to allow a thread to add
>> new data in the buffer while another thread writes the buffer on disk?
>>
>> So AFAIU, only one thread will be allowed to write data into the buffer, up
>> to the point it reaches a record being hold by the flush thread, and only
>> one thread can flush the data, up to the point it reaches the last record it
>> can write (which is computed before the flush() method is called).
>>
>> I'm wondering if we couldn't use a simpler algorithm, where we have a flush
>> thread used to flush the data in any case. If the buffer is full, we stop
>> writing until we are signaled that there is some room left (and this is the
>> flush thread role to signal the writer that it can start again). That means
>> we write as much as we can, signaling each record to the flush thread, and
>> the flush thread will consume the record when they arrive. If both are
>> colliding (ie, no more room remains in the buffer, the reader will have to
>> wait for the writer to wake it up). We won't need to use a buffer at all, we
>> just pass the records (plus their headers and trailers) in  queue, avoiding
>> a copy in a temporary memory.
>>
>> This is basically doing the same thing, but we don't wait until the buffer
>> is full to wake up the writer. This is the way the network layer works in
>> NIO, with a selector signaling the writer thread when it's ready to accept
>> some more data to be written.
>
> I am confused about the buffering (or no buffering) you suggest. Are
> you suggesting a flush thread will use directly write off the user's
> buffer without any in mem copy?
>
> Currently the things work like this on the common code path:
>
> * for user threads:
> prepare record
> get log latch
> copy in memory buffer and get LSN(logicla sequence number).
> release log latch
> return LSN
>
>
> *for background flushing thread:
> wake up periodically , reap the in memory log and write
>
> so background does not necessarily wait for buffer to be full to
> wakeup and write.In the hopefully less common case, if the buffer is
> full, a user thread will take it for the team and write the buffer(we
> could signal the flush thread as an alternative here).
>
> In the common case, this allows user threads not wait for write and
> getting an LSN quickly(LSN is important to order log records) and
> batching of writes. Similar algorithms are used for all database WAL
> code I looked at(including Apache Derby)
>
>>
>> thougths ?
>>
>> --
>> Regards,
>> Cordialement,
>> Emmanuel Lécharny
>> www.iktek.com
>>
>
> thanks
> Selcuk

Mime
View raw message