directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Selcuk AYA <>
Subject Re: [Txn Layer] WAL flush questions
Date Mon, 19 Mar 2012 18:38:12 GMT
On Mon, Mar 19, 2012 at 11:32 AM, Emmanuel Lécharny <> wrote:
> Le 3/19/12 6:59 PM, Selcuk AYA a écrit :
>> On Mon, Mar 19, 2012 at 10:41 AM, Emmanuel Lécharny<>
>>  wrote:
>>> Le 3/19/12 6:26 PM, Selcuk AYA a écrit :
>>>> On Mon, Mar 19, 2012 at 9:24 AM, Emmanuel Lécharny<>
>>>>  wrote:
>>>>> Hi,
>>>>> I have a few questions about the handling of the log buffer.
>>>>> When we can't write anymore data in the buffer, because it's full, we
>>>>> try
>>>>> to
>>>>> flush the buffer on disk. What happens then is :
>>>>> - if there is enough room remaining in the buffer, we write a skip
>>>>> record
>>>>> (with a -1 length) : is it necessary ? (we then rewind the buffer)
>>>>> - otherwise, we rewind the buffer
>>>>> In any case, we increment the writeAheadRewindCount : what for ?
>>>>> then we call the flush() method, which will be executed only if there
>>>>> is
>>>>> no
>>>>> other thread flushing the buffer already (just in case the sync()
>>>>> method
>>>>> is
>>>>> called by another thread). I guess this is intended to allow a thread
>>>>> to
>>>>> add
>>>>> new data in the buffer while another thread writes the buffer on disk?
>>>>> So AFAIU, only one thread will be allowed to write data into the
>>>>> buffer,
>>>>> up
>>>>> to the point it reaches a record being hold by the flush thread, and
>>>>> only
>>>>> one thread can flush the data, up to the point it reaches the last
>>>>> record
>>>>> it
>>>>> can write (which is computed before the flush() method is called).
>>>>> I'm wondering if we couldn't use a simpler algorithm, where we have a
>>>>> flush
>>>>> thread used to flush the data in any case. If the buffer is full, we
>>>>> stop
>>>>> writing until we are signaled that there is some room left (and this
>>>>> the
>>>>> flush thread role to signal the writer that it can start again). That
>>>>> means
>>>>> we write as much as we can, signaling each record to the flush thread,
>>>>> and
>>>>> the flush thread will consume the record when they arrive. If both are
>>>>> colliding (ie, no more room remains in the buffer, the reader will have
>>>>> to
>>>>> wait for the writer to wake it up). We won't need to use a buffer at
>>>>> all,
>>>>> we
>>>>> just pass the records (plus their headers and trailers) in  queue,
>>>>> avoiding
>>>>> a copy in a temporary memory.
>>>>> This is basically doing the same thing, but we don't wait until the
>>>>> buffer
>>>>> is full to wake up the writer. This is the way the network layer works
>>>>> in
>>>>> NIO, with a selector signaling the writer thread when it's ready to
>>>>> accept
>>>>> some more data to be written.
>>>> I am confused about the buffering (or no buffering) you suggest. Are
>>>> you suggesting a flush thread will use directly write off the user's
>>>> buffer without any in mem copy?
>>> Yes. In fact, I suggest we buffer the records, without copying them. When
>>> the flush thread is waken up (or kicked), it will write the header, the
>>> buffer, the  footer. We can use ByteBuffer gathering for that (see
>> I see.But this is effectively what we are doing right? Instead of
>> putting the buffers in a queue and doing scatter/gather through byte
>> buffer(which will eventually do a memcpy to do a single batched write
>> I think), we copy into an in mem buffer and let the flushing thread to
>> do the single batched write.
> Yes, but you copy the user records into a temporary ByteBuffer, which will
> be read and flushed. If you put the user records in a queue, you don't need
> this extra copy, plus you don't need to allocate a 4Mb buffer at all. That
> does not mean you won't suck those 4 Mb, if the queue is not emptied fast
> enough by the flush thread, but in the general case, you just end using less
> memory if the flush thread is awakened when some data is present in the
> queue.
So we want to write to the end of log a batched write using a "single"
IO. What I am saying this wont the java byte buffer implementation
have to internally copy the buffers into a single buffer and do a
single batched write from that buffer?

> --
> Regards,
> Cordialement,
> Emmanuel Lécharny

View raw message