directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Lécharny <>
Subject Re: [Txn Layer] WAL flush questions
Date Mon, 19 Mar 2012 17:41:55 GMT
Le 3/19/12 6:26 PM, Selcuk AYA a écrit :
> On Mon, Mar 19, 2012 at 9:24 AM, Emmanuel Lécharny<>  wrote:
>> Hi,
>> I have a few questions about the handling of the log buffer.
>> When we can't write anymore data in the buffer, because it's full, we try to
>> flush the buffer on disk. What happens then is :
>> - if there is enough room remaining in the buffer, we write a skip record
>> (with a -1 length) : is it necessary ? (we then rewind the buffer)
>> - otherwise, we rewind the buffer
>> In any case, we increment the writeAheadRewindCount : what for ?
>> then we call the flush() method, which will be executed only if there is no
>> other thread flushing the buffer already (just in case the sync() method is
>> called by another thread). I guess this is intended to allow a thread to add
>> new data in the buffer while another thread writes the buffer on disk?
>> So AFAIU, only one thread will be allowed to write data into the buffer, up
>> to the point it reaches a record being hold by the flush thread, and only
>> one thread can flush the data, up to the point it reaches the last record it
>> can write (which is computed before the flush() method is called).
>> I'm wondering if we couldn't use a simpler algorithm, where we have a flush
>> thread used to flush the data in any case. If the buffer is full, we stop
>> writing until we are signaled that there is some room left (and this is the
>> flush thread role to signal the writer that it can start again). That means
>> we write as much as we can, signaling each record to the flush thread, and
>> the flush thread will consume the record when they arrive. If both are
>> colliding (ie, no more room remains in the buffer, the reader will have to
>> wait for the writer to wake it up). We won't need to use a buffer at all, we
>> just pass the records (plus their headers and trailers) in  queue, avoiding
>> a copy in a temporary memory.
>> This is basically doing the same thing, but we don't wait until the buffer
>> is full to wake up the writer. This is the way the network layer works in
>> NIO, with a selector signaling the writer thread when it's ready to accept
>> some more data to be written.
> I am confused about the buffering (or no buffering) you suggest. Are
> you suggesting a flush thread will use directly write off the user's
> buffer without any in mem copy?
Yes. In fact, I suggest we buffer the records, without copying them. 
When the flush thread is waken up (or kicked), it will write the header, 
the buffer, the  footer. We can use ByteBuffer gathering for that (see
> Currently the things work like this on the common code path:
> * for user threads:
> prepare record
> get log latch
> copy in memory buffer and get LSN(logicla sequence number).
> release log latch
> return LSN
> *for background flushing thread:
> wake up periodically , reap the in memory log and write
> so background does not necessarily wait for buffer to be full to
> wakeup and write.In the hopefully less common case, if the buffer is
> full, a user thread will take it for the team and write the buffer(we
> could signal the flush thread as an alternative here).
> In the common case, this allows user threads not wait for write and
> getting an LSN quickly(LSN is important to order log records) and
> batching of writes. Similar algorithms are used for all database WAL
> code I looked at(including Apache Derby)
I have something different in mind to get the record ordered : inject 
them in a queue (as only one single writer will access the queue, the 
order will be guaranteed). The flush thread will be waiting on this 
queue to be modified to flush the data on disk. This queue can contain a 
limited number of records, and we can check if that the record size does 
not exceed a certain amount.

In any case, the flush thread is autonomous, and can either be wakened 
up when the queue has some data, or wait to be wakened up periodically, 
of when the queue is full.

Does it makes sense ?
Note : I'm not suggesting that we should change the current code, just 
trying to get some thougth food for later improvement...

Emmanuel Lécharny

View raw message