Hi Kiran,

On Mon, Feb 2, 2009 at 12:28 PM, Kiran Ayyagari <ayyagarikiran@gmail.com> wrote:
hi Alex,

Again like I said it's not this simple. I think JDBM API's start to fail overall on corruption depending on how the corruption impacts accessing the BTree. One bad access can cause access to half the entries to fail.
yeah, JDBM fails even if a single byte in the end changes (I just removed a character using vi editor then started the server, it barfed ;) )

I think you're idea would work very well if the journal was well integrated with JDBM at the lowest level.
don't think I understood the 'lowest level' completely

For example you need to know when a page goes bad and repair the page.  You need to be involved with the structures deep in the library to detect problems in them.  For example you added a byte in the end of a db file and everything got screwed up. To recover from something like this would require adding some code to the RecordManager, BlockIO and PageHeader classes.  You basically need to integrate your Journal code into the JDBM library.

and one more thing, I think this journal should be a replica of CL (CL is more robust with indices though) this way we can even recover a crashed CL

Hmm I see what you're thinking.  I think we're all having problems drawing distinctions between these various facilities in the server.  I know I have wavered myself.

At first I was thinking we should have an extremely simple journal with markers tracking application of operations. Some conversations with Emmanuel then lead me to believe that using the CL as the journal was just as good.  Now I feel it might not be such a good idea.  Let me list my thoughts:

(1) The CL is highly indexed with several db files which means there will be many writes need to persist the record while keeping the CL and it's indices consistent.  Also the CL is deep inside the server in the interceptor chain and many things can go wrong while getting to that point, not to mention the processing time it takes to get there.

(2) The CL should be used for auditing, versioning, snapshoting, and replication.  It is fast thanks to indices and will should all the operations that have succeeded in inducing changes.  It would get more complicated if we started using it to also capture operations before they have been applied.  The semantics would shift.

(3) As you say the CL itself can get corrupted.  And for this reason it's not well suited as a journal for all (even in progress) operations.

I'm seriously thinking the use of the CL for a journal is not a good decision. The journal needs to be fast and simple, doing only one thing and doing it fast and flawlessly.