db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jørgen Løland (JIRA) <j...@apache.org>
Subject [jira] Commented: (DERBY-3562) Number of log files (and log dir size) on the slave increases continuously
Date Fri, 28 Mar 2008 09:16:25 GMT

    [ https://issues.apache.org/jira/browse/DERBY-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582949#action_12582949
] 

Jørgen Løland commented on DERBY-3562:
--------------------------------------

Mike Matrigali writes:
> I didn't quite follow all of this, and admit i am not up on replication.
> It would be nice if this process was the exact code as the normal
> checkpoint processing.  So a checkpoint would be triggered and then
> after it had done it's work it would do the appropriate cleanup.  If you
> do the cleanup too soon then redo recovery of the slave won't work - is
> that expected to work or at that point to you just restart from scratch
> from master.

> The existing code that replay's multiple checkpoints may be wierd as it
> may assume that this is recovery of a backed up database that is meant
> to keep all of it's log files.  Make sure to not break that.

> Is there a concept of a "fully" recoverable slave, ie. one that is
> supposed to keep all of it's log files so that it is recoverable in
> case of a data crash.  As I said may not be necessary as there is
> always the master.  Just good to know what is expected.
Mike,

Thank you for expressing your concerns. I'll do my best to explain why I think the proposed
solution will work.

The patch adds functionality to the checkpoint processing used during recovery (LogToFile#checkpointInRFR).
During recovery, the dirty data pages are flushed to disk, and the log.ctrl file is updated
to point to the new checkpoint currently being processed.

With the patch [1], the log files that are older than the currently processed checkpoint's
Undo Low Water Mark (undo LWM) are then deleted. The undo LWM points to the earliest log record
that may be required to do recovery [2]. Since the log files are processed sequentially and
the data pages have been flushed, the undo LWM in the checkpoint is equally valid during recovery
(aka slave replication mode) as during normal transaction processing.

Once replication has successfully started, the slave database will always be recoverable [3],
but not in case of corrupted data blocks [4]. You may at any time crash the Derby serving
the slave database and then reboot it. The used-to-be-slave database will then recover to
a transaction consistent state including the modifications from all transactions whose commit
log record was written to disk on the slave before the crash.

Please follow up if you think I may have misunderstood anything or did not answer your questions
good enough.

[1] The patch only applies to slave replication mode. Backup is not affected as to not break
the "fully" recoverability feature for backups.
[2] The first log record of the oldest transaction in the checkpoint's transaction table.
[3] If "fully" recoverable means recovering in presence of corrupted data blocks, this is
currently not supported for replication.
[4] Not including jar files, as explained in DERBY-3552.

> Number of log files (and log dir size) on the slave increases continuously
> --------------------------------------------------------------------------
>
>                 Key: DERBY-3562
>                 URL: https://issues.apache.org/jira/browse/DERBY-3562
>             Project: Derby
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 10.4.0.0, 10.5.0.0
>         Environment: -
>            Reporter: Ole Solberg
>            Assignee: Jørgen Løland
>         Attachments: derby-3562-1a.diff, derby-3562-1a.stat, master_slave-db_size-6.jpg
>
>
> I did a simple test inserting tuples in a table during replication:
> The attached file 'master_slave-db_size-6.jpg' shows that 
> the size of the log directory (and number of files in the log directory)
> increases continuously during replication, while on master the size 
> (and number of files) never exceeds ~12Mb (12 files?) in this scenario.
> The seg0 directory on the slave stays at the same size as the master 
> seg0 directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message