Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm
Precedence: bulk
Reply-To: <derby-dev@db.apache.org>
Message-ID: <82270663.1206695785044.JavaMail.jira@brutus>
Date: Fri, 28 Mar 2008 02:16:25 -0700 (PDT)
From: =?utf-8?Q?J=C3=B8rgen_L=C3=B8land_=28JIRA=29?= <jira@apache.org>
To: derby-dev@db.apache.org
Subject: [jira] Commented: (DERBY-3562) Number of log files (and log dir
 size) on the slave increases continuously
In-Reply-To: <372561205.1206042085162.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/DERBY-3562?page=3Dcom.atlassian=
.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1258=
2949#action_12582949 ]=20

J=C3=B8rgen L=C3=B8land commented on DERBY-3562:
--------------------------------------

Mike Matrigali writes:
> I didn't quite follow all of this, and admit i am not up on replication.
> It would be nice if this process was the exact code as the normal
> checkpoint processing.  So a checkpoint would be triggered and then
> after it had done it's work it would do the appropriate cleanup.  If you
> do the cleanup too soon then redo recovery of the slave won't work - is
> that expected to work or at that point to you just restart from scratch
> from master.

> The existing code that replay's multiple checkpoints may be wierd as it
> may assume that this is recovery of a backed up database that is meant
> to keep all of it's log files.  Make sure to not break that.

> Is there a concept of a "fully" recoverable slave, ie. one that is
> supposed to keep all of it's log files so that it is recoverable in
> case of a data crash.  As I said may not be necessary as there is
> always the master.  Just good to know what is expected.
Mike,

Thank you for expressing your concerns. I'll do my best to explain why I th=
ink the proposed solution will work.

The patch adds functionality to the checkpoint processing used during recov=
ery (LogToFile#checkpointInRFR). During recovery, the dirty data pages are =
flushed to disk, and the log.ctrl file is updated to point to the new check=
point currently being processed.

With the patch [1], the log files that are older than the currently process=
ed checkpoint's Undo Low Water Mark (undo LWM) are then deleted. The undo L=
WM points to the earliest log record that may be required to do recovery [2=
]. Since the log files are processed sequentially and the data pages have b=
een flushed, the undo LWM in the checkpoint is equally valid during recover=
y (aka slave replication mode) as during normal transaction processing.

Once replication has successfully started, the slave database will always b=
e recoverable [3], but not in case of corrupted data blocks [4]. You may at=
 any time crash the Derby serving the slave database and then reboot it. Th=
e used-to-be-slave database will then recover to a transaction consistent s=
tate including the modifications from all transactions whose commit log rec=
ord was written to disk on the slave before the crash.

Please follow up if you think I may have misunderstood anything or did not =
answer your questions good enough.

[1] The patch only applies to slave replication mode. Backup is not affecte=
d as to not break the "fully" recoverability feature for backups.
[2] The first log record of the oldest transaction in the checkpoint's tran=
saction table.
[3] If "fully" recoverable means recovering in presence of corrupted data b=
locks, this is currently not supported for replication.
[4] Not including jar files, as explained in DERBY-3552.

> Number of log files (and log dir size) on the slave increases continuousl=
y
> -------------------------------------------------------------------------=
-
>
>                 Key: DERBY-3562
>                 URL: https://issues.apache.org/jira/browse/DERBY-3562
>             Project: Derby
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 10.4.0.0, 10.5.0.0
>         Environment: -
>            Reporter: Ole Solberg
>            Assignee: J=C3=B8rgen L=C3=B8land
>         Attachments: derby-3562-1a.diff, derby-3562-1a.stat, master_slave=
-db_size-6.jpg
>
>
> I did a simple test inserting tuples in a table during replication:
> The attached file 'master_slave-db_size-6.jpg' shows that=20
> the size of the log directory (and number of files in the log directory)
> increases continuously during replication, while on master the size=20
> (and number of files) never exceeds ~12Mb (12 files?) in this scenario.
> The seg0 directory on the slave stays at the same size as the master=20
> seg0 directory.

--=20
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.