directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Selcuk AYA <>
Subject Re: [Txn branch] Question about the WAL
Date Thu, 15 Mar 2012 08:12:02 GMT
On Thu, Mar 15, 2012 at 12:11 AM, Emmanuel Lécharny <> wrote:
> Le 3/15/12 5:04 AM, Selcuk AYA a écrit :
>> Lets continue the discussion here. I got this email at my 6AM. I was
>> planning to take a look at the code and refresh my memory before
>> replying and I can do that while I am at home only. That is why it
>> took some time to reply. Next time please allow me sometime to reply
>> to your emails.
>> First thing is a general FYI. There is a class called
>> DefaultLogScanner which exposes a getNextRecord method. This class can
>> be used to read the log records.
>> The other thing is I would like to avoid any kind of code reorg or
>> format changes at this point.
> Ok, understood.
>> thanks
>> Selcuk
>> On Wed, Mar 14, 2012 at 1:09 PM, Selcuk AYA<>  wrote:
>>> We already have code to read log records and we do not need a type in
>>> log edits. We do not call this yet as we do not do crash recovery yet.
>>> I saw you already committed some changes for this without waiting for
>>> a reply. Please revert your latest commit.
>>> thanks
>>> Selcuk
>>> On Wed, Mar 14, 2012 at 6:10 AM, Emmanuel Lécharny<>
>>>  wrote:
>>>> Hi,
>>>> as i'm reviewing the way we manage the WAL (Write Ahead Log), I have a
>>>> few
>>>> questions :
>>>> 1) UserLogRecord
>>>> It's a data structure encapsulating an opaque byte[] containing a
>>>> serialized
>>>> form of a record. We have two length, the serialized data length, and
>>>> the
>>>> buffer length (which might be wider).
>>>> I guess that the rational is that we first allocate a buffer, and we may
>>>> store some smaller data into this buffer. Sounds ok, but the question is
>>>> why
>>>> we can't simply store a full buffer (ie only allocate what we need)? Am
>>>> I
>>>> missing something here ?
>> this is an optimization for reading the log mostly(but could be used
>> for writing to the log as well). When log records are read, it is
>> possible to reuse the buffer that was used to read the previous record
>> if the buffer is large enough. Otherwise a new buffer is allocated. So
>> this reduces the number of buffer allocations while reading the log.
> IMO, as the log is very unlikely to be read often (except in a crash
> recovery scenario), I don't think it's a good idea to reuse the buffer. That
> would make the log file bigger than necessary.
>>>> 2) LogEdit
>>>> When we write the LogEdit instance, we have no way to read them back as
>>>> we
>>>> don't know if we have written a TxnChangeState or a DataChangeContainer.
>>>> Even for a DataChangeContainer, which contains a list of DataChange (ie
>>>> either IndexModification or EntryModification), we have no indication
>>>> about
>>>> the written type.
>>>> I think we need to add an identifier at the begining of the written data
>>>> structure to allow the reader to know which kind of object to create, or
>>>> again, I'm missing something (like we will always know what kind of
>>>> object
>>>> we are expecting, because they are ordered - unlikely for indexChange,
>>>> as we
>>>> will have a variable number of modified indexes -.
>> DefaultLog class implements a WAL system and is oblivious to who is
>> using it. When a user log record is added to the log file, the log
>> manager(DeafultLog not the txn log manager), gets the byte stream,
>> appends a header and prepends a footer to this and writes it to the
>> log. The header and footer are fixes size byte streams and includes
>> magic number, chksums and most importantly the size of the user log
>> record. When user does a getNextRecord, log manager reads a header,
>> the user log record as a an array  of bytes using the length stored in
>> the header and then the footer. It verifies the magic numbers and
>> checksum and then returns the byte array as the next user log record.
>> Client(in our case this would be txnlogmanager), can form a byte array
>> stream on this array and call redObject() to construct the object. Txn
>> log manager can then check what kind of logedit it has doing an
>> instanceof check.
> As you are using a tunned writeExternal() method to write the classes, you
> lose all the information needed to read back an object without knowing its
> type. That means you can't anymore do a readObject() on the stream. That
> would be different if the LogEdit instance where not implementing
> Externalizable, but Serializable, but as it's not the case, you have to
> provide this information.
>> Since what txnlogmanager gets from the log manager is exactly the
>> deserialized form of one of its objects, it does not need to add any
>> type information to its log edits. Java handles it for him.
> Not if you used writeExternal().
> And I really think that using wxriteExternal() is the thing to do : it's 3
> times faster than using writeObject(), and the resulting log will be smaller
> too.

Talked to Emmanul about this on the IRC. I used to use
readobject/writeobject rather than writeexternal/readexternal. Since
the first wrote object type info as well, had no problem. But it seems
using write/readexternal seems to be faster as per Emmanuel, so it
makes sense to use his suggesstion to add type info to the log edits
and deserialize based on the type.

> --
> Regards,
> Cordialement,
> Emmanuel Lécharny


View raw message