db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jørgen Løland (JIRA) <j...@apache.org>
Subject [jira] Commented: (DERBY-2872) Add Replication functionality to Derby
Date Mon, 09 Jul 2007 09:57:04 GMT

    [ https://issues.apache.org/jira/browse/DERBY-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511066
] 

Jørgen Løland commented on DERBY-2872:
--------------------------------------

Thank you for the extensive comments from both Rick and Narayanan. I have a few supplementary
comments to those from Narayanan.

>>Looks like you have addressed issue (1). I see in your comments above, that you are
in agreement about how to address issue (2), but I don't see this reflected in the new spec
itself. I'm getting the impression that the answer to (3) and (4) is that the first rev of
replication won't handle these issues; instead, they will be addressed in a later rev. Is
that right?
>I interpret it that a manual startup is planned for now. Is a auto startup on the cards?


Re 2: It says so below the table of NetworkServerControl commands, but I will make it clearer
in the next version of the spec.
Re 3 and 4: That's correct; in the first rev, there will be no automatic restart of replication
when one of the instances have failed. The DB owner will have to manually restart replication.
A later improvement may automate this step; this is a good candidate for extending the functionality
later.

>>5) A heads-up about the user/password options on the new server commands. There has
been some discussion about authenticating server shutdown operations and general agreement
that the current situation is confusing. DERBY-2109 intends to add credentials to the server
shutdown command. I think that the same api should be used to specify username and password
for all of our server commands--whatever that api turns out to be.
>Thank you for this pointer. I guess taking the same lines as 2109 is the thing to do here.


I agree. There is no reason why authentication for replication should differ from other commands.
The NetworkServerControl commands I wrote in the func spec show what information is needed.
I will modify the next version of the func spec to state that authentication is needed, and
should be performed in the same manner as other NetworkServerControl commands.

>>6) I think it would be clearer if the url option were called slaveurl. Do we need
a symmetric masterurl option for the startslave command? How does the slave know that it is
receiving records from the correct master? What  happens if two masters try to replicate to
the same slave?

>This would be an issue I guess because the slave would assume both to be legitimate unless
we send the database name each time.
>But what would happen if both use the same database also.
>Can this be eliminated by having a handshake phase before the actual log transfer occurs.
So if the same url is being used for a second handshake we would reject this unless this is
a reconnect attempt after the master has
crashed. 

We should only allow one connection to a slave database. A handshake sounds like a good idea.


>>7) Is the startmaster command restricted to a server running on the same machine as
the master database? Similarly, is the startslave command restricted to a server on the slave
database machine? What about failover and stop?

I think the start and failover commands needs to be restricted to the same machine as the
database resides, but this depends on the NetworkServerControl security. Again, this should
be equal to the policy for other NetworkServerControl commands. See 12) for how to stop replication.

>>8) I am confused about the startslave command. Does this create a new database? If
so, how are the credentials enforced in the case that credentials are stored in the database?
If not, what happens if there is already a database by that name? Is the database destroyed
and replaced after authentication?

Since this has not been implemented yet, the solution may have to change later. However, the
current intention is that the first thing that happens on the slave is that it receives the
database 'x' from the master. When 'x' has been received, the slave starts the boot process
of 'x'. So, the slave does not create 'x', even though it did not exist on the slave when
the startslave command was issued. 

We will have to check that a database with the same name does not exist on the slave. Furthermore,
we should probably ensure that the owner of 'x' is allowed to create a database on the slave.
Did you think of any other permissions we should check for? Maybe a allowedToReplicate credential
would be needed?

>>9) If you have stopped replication, can you resume it later on?
>If stopping replication means that we will not archive logs anymore I guess this will
not be possible. If the logs are still archived we can transmit from the log after replication
has been stopped and the slave can still redo from there and replication from continue. That
is we should not call SYSCS_UTIL.SYSCS_DISABLE_LOG_ARCHIVE_MODE system procedure after stopping
replication. Guess the user should be able to decide this.

I am not sure about this. If a failover was performed, the answer is definately 'no' because
the repliaction method assumes that the physical layout of the databases are equal. A failover
will not preserve this exactly equal physical layout since the failover process will undo
uncommitted transactions. If the replication was simply turned off, Narayanans suggestion
of starting log shipment from some defined log record will probably work. 

However, I think we have to be restrictive in the first version of the functionality. For
now, I think the answer will be 'no', i.e., you have to restart replication by first deleting
the database (on the slave), and then send the entire database to the slave. Resuming replication
makes a good candidate for extending the functionality.

>>10)  What is the sequence of these commands? Do you first issue a startmaster and
then issue a startslave? What happens if the commands occur out of sequence? Similarly for

>Since the startslave starts a listener this should be done first before startmaster. 

It is correct that the slave will be listening for the master and therefore must be started
before replication can start. However, I see no reason why the connection attempts should
not be retried every now and then until the slave is ready to accept the connection.

Hence, I don't think we need a defined sequence of commands. When the slave starts, it does
nothing until a master connects to it (except write some messages to derby.log). When the
master is started, it continues as normal (also writes some messages to derby.log) until it
is able to get a connection to the slave. 

>>11) It would be nice to understand how we insulate replication from man-in-the-middle
attacks--even if we don't implement these protections in this first version.

That is a good point. It would, e.g., be possible to use a signature. The slave could send
a hashed username to the master, and the master could respond by sending the hashed password.
It should not be possible to "unhash" the username/password. But I am no security expert,
hence input on this issue is appreciated. And you are right; this will not be handled in the
first version.

>>12) What happens if someone tries to connect to an active slave? What happens if someone
tries to shutdown an active slave without first stopping replication at the master's end?

If someone tries to connect to a db 'x' that has the slave role in derby instance 'i', the
connection is refused. Note that the derby instance 'i' may manage other databases at the
same time. Making a connection to these other databases is unaffected by replication.

>A connect attempt from the master would fail and the master would report that the connection
has been terminated due to the slave not being able to be reached or that a slave could not
be found. Would this case be different from trying to connect to a Derby NetworkServer when
it has been shutdown? 

The initial plan was to allow shutdown at both ends. Now that you mention it, however, stopping
replication from the master seems to be more clean. Hence, I think the revised plan should
be as follows: Stopping replication will be performed by issuing the stopreplication command
at the master. The master then sends a stop replication message over the network connection
to the slave.

>>13) What happens if the slave is shut down and then, later on, someone tries to boot
the slave as an embedded database?

That will be allowed. In this case, the database will then boot to a transaction consistent
state that includes all transactions that were committed (and sent, of course) before the
shutdown.

> Add Replication functionality to Derby
> --------------------------------------
>
>                 Key: DERBY-2872
>                 URL: https://issues.apache.org/jira/browse/DERBY-2872
>             Project: Derby
>          Issue Type: New Feature
>          Components: Miscellaneous
>    Affects Versions: 10.4.0.0
>            Reporter: Jørgen Løland
>            Assignee: Jørgen Løland
>         Attachments: proof_of_concept_master.diff, proof_of_concept_master.stat, proof_of_concept_slave.diff,
proof_of_concept_slave.stat, replication_funcspec.html, replication_funcspec_v2.html, replication_script.txt
>
>
> It would be nice to have replication functionality to Derby; many potential Derby users
seem to want this. The attached functional specification lists some initial thoughts for how
this feature may work.
> Dag Wanvik had a look at this functionality some months ago. He wrote a proof of concept
patch that enables replication by copying (using file system copy) and redoing the existing
Derby transaction log to the slave (unfortunately, I can not find the mail thread now).
> DERBY-2852 contains a patch that enables replication by sending dedicated logical log
records to the slave through a network connection and redoing these.
> Replication has been requested and discussed previously in multiple threads, including
these:
> http://mail-archives.apache.org/mod_mbox/db-derby-user/200504.mbox/%3c426E04C1.1070904@yahoo.de%3e
> http://www.nabble.com/Does-Derby-support-Transaction-Logging---t2626667.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message