directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Lécharny <>
Subject Re: [Txn branch] Question about the WAL
Date Thu, 15 Mar 2012 07:11:37 GMT
Le 3/15/12 5:04 AM, Selcuk AYA a écrit :
> Lets continue the discussion here. I got this email at my 6AM. I was
> planning to take a look at the code and refresh my memory before
> replying and I can do that while I am at home only. That is why it
> took some time to reply. Next time please allow me sometime to reply
> to your emails.
> First thing is a general FYI. There is a class called
> DefaultLogScanner which exposes a getNextRecord method. This class can
> be used to read the log records.
> The other thing is I would like to avoid any kind of code reorg or
> format changes at this point.
Ok, understood.
> thanks
> Selcuk
> On Wed, Mar 14, 2012 at 1:09 PM, Selcuk AYA<>  wrote:
>> We already have code to read log records and we do not need a type in
>> log edits. We do not call this yet as we do not do crash recovery yet.
>> I saw you already committed some changes for this without waiting for
>> a reply. Please revert your latest commit.
>> thanks
>> Selcuk
>> On Wed, Mar 14, 2012 at 6:10 AM, Emmanuel Lécharny<>  wrote:
>>> Hi,
>>> as i'm reviewing the way we manage the WAL (Write Ahead Log), I have a few
>>> questions :
>>> 1) UserLogRecord
>>> It's a data structure encapsulating an opaque byte[] containing a serialized
>>> form of a record. We have two length, the serialized data length, and the
>>> buffer length (which might be wider).
>>> I guess that the rational is that we first allocate a buffer, and we may
>>> store some smaller data into this buffer. Sounds ok, but the question is why
>>> we can't simply store a full buffer (ie only allocate what we need)? Am I
>>> missing something here ?
> this is an optimization for reading the log mostly(but could be used
> for writing to the log as well). When log records are read, it is
> possible to reuse the buffer that was used to read the previous record
> if the buffer is large enough. Otherwise a new buffer is allocated. So
> this reduces the number of buffer allocations while reading the log.

IMO, as the log is very unlikely to be read often (except in a crash 
recovery scenario), I don't think it's a good idea to reuse the buffer. 
That would make the log file bigger than necessary.
>>> 2) LogEdit
>>> When we write the LogEdit instance, we have no way to read them back as we
>>> don't know if we have written a TxnChangeState or a DataChangeContainer.
>>> Even for a DataChangeContainer, which contains a list of DataChange (ie
>>> either IndexModification or EntryModification), we have no indication about
>>> the written type.
>>> I think we need to add an identifier at the begining of the written data
>>> structure to allow the reader to know which kind of object to create, or
>>> again, I'm missing something (like we will always know what kind of object
>>> we are expecting, because they are ordered - unlikely for indexChange, as we
>>> will have a variable number of modified indexes -.
> DefaultLog class implements a WAL system and is oblivious to who is
> using it. When a user log record is added to the log file, the log
> manager(DeafultLog not the txn log manager), gets the byte stream,
> appends a header and prepends a footer to this and writes it to the
> log. The header and footer are fixes size byte streams and includes
> magic number, chksums and most importantly the size of the user log
> record. When user does a getNextRecord, log manager reads a header,
> the user log record as a an array  of bytes using the length stored in
> the header and then the footer. It verifies the magic numbers and
> checksum and then returns the byte array as the next user log record.
> Client(in our case this would be txnlogmanager), can form a byte array
> stream on this array and call redObject() to construct the object. Txn
> log manager can then check what kind of logedit it has doing an
> instanceof check.
As you are using a tunned writeExternal() method to write the classes, 
you lose all the information needed to read back an object without 
knowing its type. That means you can't anymore do a readObject() on the 
stream. That would be different if the LogEdit instance where not 
implementing Externalizable, but Serializable, but as it's not the case, 
you have to provide this information.
> Since what txnlogmanager gets from the log manager is exactly the
> deserialized form of one of its objects, it does not need to add any
> type information to its log edits. Java handles it for him.
Not if you used writeExternal().

And I really think that using wxriteExternal() is the thing to do : it's 
3 times faster than using writeObject(), and the resulting log will be 
smaller too.

Emmanuel Lécharny

View raw message