Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm
Precedence: bulk
Reply-To: <derby-dev@db.apache.org>
Message-ID: <10124960.1184583784841.JavaMail.jira@brutus>
Date: Mon, 16 Jul 2007 04:03:04 -0700 (PDT)
From: "V.Narayanan (JIRA)" <jira@apache.org>
To: derby-dev@db.apache.org
Subject: [jira] Commented: (DERBY-2872) Add Replication functionality to
 Derby
In-Reply-To: <16039538.1182851726399.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/DERBY-2872?page=3Dcom.atlassian=
.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512911 ]=20

V.Narayanan commented on DERBY-2872:
------------------------------------

>Great that you guys are running with this one! Some comments to the
>functional specification:

Thank you for the reviews and comments Dag.

>* Derby doesn't log all operations by default, e.g. bulk
>  import, deleting all records from a table, creating an index.  These
>  issues were addressed in the work on online backup (DERBY-239),
>  partially by denying online backup if non-logged operations are not
>  yet committed, and partly by making them do logging when online
>  backup is in effect (the reason for not logging some operations is
>  performance). I guess for replication you would need to make them
>  do logging for the duration.

I agree you are very correct.=20

I thought I could get some idea as to how this could be done by going throu=
gh=20
the patches Derby-239.

I read through Derby-239 and found patches
onlinebackup(3&7).diff to be of interest to us.

The primary motivation of 3 was the following=20

"To make a consistent online backup in this scenario, this patch:

1) blocks online backup until all the transactions with unlogged operation =
are
    committed/aborted.
2) implicitly converts all unlogged operations to logged mode for the durat=
ion
    of the online backup, if they are started when backup is in progress. "

7 addressed comments on 3.

>* Overview of characteristics: you mention the network line as a
>  single point of failure, that's fine in a first version. One could
>  imagine having the replication service support more network interfaces=
=20
>  to alleviate this vulnerability.

I agree!

Some random thoughts on the modifications that would be required.

when the master or slave has multiple network interfaces each of them
can be accessed using different IP Addresses to which the network interface=
s
would be bound.=20

* The start replication command on the master and slave=20
  will have to be modified to accept the multiple IP addresses
  of the peer.

* The log sender should be capable of detecting failure to send
  to one and switch to sending to the other.

* The log receiver should be modified to be able to listen at both
  the interfaces.


>* Fail-over: When fail-over is performed (with a command on the
>  slave), I assume will the master be told to stop its replication so
>  it can tidy up? Since you describe the semantics of the stop
>  replication command as shutting down the slave database, I assume
>  failover can be performed without requiring a prior stop replication
>  command on the master.

you are correct. We would not issue a stop on the master to complete a
fail-over.

>  If reaching master is not possible (lost connection), can the stop
>  replication command be used against the master to tidy up even when
>  connection has been lost?

Not being able to reach the master would mean that the replication process
should automatically stop. The sender should quit trying to send logs.
But this case would arise when you try the stop command between the time th=
e
master tries to connect to the slave and the master automatically stops=20
replication.

In this case a stop command should close down all replication behaviour,=20
independently of whether a connection is actually established. the only=20
difference is that if the connection is ok, the slave is shut down as well

> Perhaps it would be good to include in your table of commands any
> preconditions for the commands. BTW, It seems a good idea to not
> impose an order on starting server or slave first.

Will add a column preconditions for each command and will fill-up the
information for them. I will submit a v4 of the func spec for this.

>* Presumably, the master will time out if it is unable to send logs to
>  the slave for some (configurable?) period. It could keep trying for
>  some time but eventually it would need to stop or overflow the
>  buffer mechanism you suggest in DERBY-2926. Will you require that
>  the user has LOG_ARCHIVE_MODE enabled? If you do, it would seem a
>  nice addition to later be able to resume replication even if the
>  buffer had to be abandoned (as you suggest). *Before* the buffer is
>  abandoned, if the network becomes available again, it would be
>  trivial to resume shipping of logs I expect?

I agree this could be a great addition, because if the buffer overflow did =
result
we could resume sending logs from the backed up logs.=20


>* Given that the system privileges work of DERBY-2109 provides us with
>  necessary security, I would hope we can lift the restriction that
>  administration commands can only be run from the same machine as the
>  server is started on, but for the time being the restriction makes
>  sense.

I agree.

>* You describe the replication commands as CLI commands against
>  NetworkServerControl; will you be making the commands available in
>  API form as well, so replication can be embedded in an application?

The slave is never booted during the time it is receiving logs. Hence=20
we would not be able to actually use a stored procedure here. This was=20
the reason we had earlier decided on a CLI command agains NetworkServerCont=
rol.=20

However if by public API you mean public methods in NetworkServerControl th=
at=20
can be reached from outside if anyone wants to code an admin program at a l=
ater=20
stage, I think this is a great idea and can be easily done.

>* typos: "enclypted", "it's local log"

  Will fix this in the next version.

> Add Replication functionality to Derby
> --------------------------------------
>
>                 Key: DERBY-2872
>                 URL: https://issues.apache.org/jira/browse/DERBY-2872
>             Project: Derby
>          Issue Type: New Feature
>          Components: Miscellaneous
>    Affects Versions: 10.4.0.0
>            Reporter: J=C3=B8rgen L=C3=B8land
>            Assignee: J=C3=B8rgen L=C3=B8land
>         Attachments: proof_of_concept_master.diff, proof_of_concept_maste=
r.stat, proof_of_concept_slave.diff, proof_of_concept_slave.stat, replicati=
on_funcspec.html, replication_funcspec_v2.html, replication_funcspec_v3.htm=
l, replication_script.txt
>
>
> It would be nice to have replication functionality to Derby; many potenti=
al Derby users seem to want this. The attached functional specification lis=
ts some initial thoughts for how this feature may work.
> Dag Wanvik had a look at this functionality some months ago. He wrote a p=
roof of concept patch that enables replication by copying (using file syste=
m copy) and redoing the existing Derby transaction log to the slave (unfort=
unately, I can not find the mail thread now).
> DERBY-2852 contains a patch that enables replication by sending dedicated=
 logical log records to the slave through a network connection and redoing =
these.
> Replication has been requested and discussed previously in multiple threa=
ds, including these:
> http://mail-archives.apache.org/mod_mbox/db-derby-user/200504.mbox/%3c426=
E04C1.1070904@yahoo.de%3e
> http://www.nabble.com/Does-Derby-support-Transaction-Logging---t2626667.h=
tml

--=20
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.