db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bergquist, Brett" <BBergqu...@canoga.com>
Subject RE: Question on recoverying after replication break because of a system failure
Date Wed, 15 Jan 2014 16:19:26 GMT
I took a look at PostgreSQL and its capability of restoring its WAL files and then switching
to stream mode for replication once those are complete is almost what I desire.

>From its manual:

http://www.postgresql.org/docs/9.3/static/warm-standby.html

" At startup, the standby begins by restoring all WAL available in the archive location, calling
restore_command. Once it reaches the end of WAL available there and restore_command fails,
it tries to restore any WAL available in the pg_xlog directory. If that fails, and streaming
replication has been configured, the standby tries to connect to the primary server and start
streaming WAL from the last valid record found in archive or pg_xlog. If that fails or streaming
replication is not configured, or if the connection is later disconnected, the standby goes
back to step 1 and tries to restore the file from the archive again. This loop of retries
from the archive, pg_xlog, and via streaming replication goes on until the server is stopped
or failover is triggered by a trigger file."

Just thinking outside of the box and not knowing Derby's internals yet, I could see something
like:

- a slave connects to the master and issues the equivalent of "start replication"
- the master uses something similar to the online backup but instead of writing the backup
to a file, it writes the backup to a stream which is transported to the slave.  Simultaneously
it also starts the replication stream writing the log entries to the slave.  This is done
at the same time because as far as I can tell, when an online database backup is started,
the backed up data is consistent as it was when the backup beings (ie. continuing changes
to the database while the online backup is occurring are not present in the backup) so if
both the backup and starting the shipping of the replication logs are performed at the same
time, then once the backup is complete, it plus any replication log entries represent at consistent
state of the database
- the slave would process the backup stream, creating the database until this is complete.
 Simultaneously it would be receiving the replication logs and persisting those.  Once the
backup is completely received, it would process the persisted replication logs and then continue
to process any new replication logs as they arrive.

I understand that this might take a while to complete and would require storage at the slave
to persist the replication logs while processing the backup from the master.  I also understand
that the equivalent of the online backup might slow down the master while this is occurring
but I think having the ability to bring up a slave without having down time on the master
would be a great feature.

I think internally, Derby has most of what is needed to accomplish this already:

   - It already has the ability to perform an online backup.  What would need to be added
would be the ability to write the backup data over network connection instead of to a filesystem
storage
   - It already has the ability to perform asynchronous replication using the recovery log.
 What would need to be added would be on the slave side to buffer and not process this until
a consistent backup were received.

Any thoughts on this?

-----Original Message-----
From: Bergquist, Brett [mailto:BBergquist@canoga.com] 
Sent: Tuesday, January 14, 2014 12:49 PM
To: derby-dev@db.apache.org
Subject: RE: Question on recoverying after replication break because of a system failure

Actually the expensive part is having the "master" system down to ensure a completely accurate
copy of the database is being made on the "slave".  Note that my "master" here could be actually
be the (original) slave system when the (original) master system is repaired.   

Derby's replication once the systems are in sync and running seems to be okay.   It is the
initial setup time to get the "slave" database to be the same as the "master" database that
is expensive because currently (unless I am wrong and correct me here if so), the master cannot
be modified while this is occurring.  Then again, restoring to the replication state once
a failed system is repaired is again expensive.

I guess I will look at how other database handle this case.  I can't imagine that adding a
"replication slave" requires that the master database being down and quiescent.  I would image
that it is possible to add a "replication slave" while the "replication master" is hot and
running.   This is what I would like Derby to be able to do (note that I am not asking for
someone else to do it, as it could very well be a contribution from me).

An analogy would be replacing a failed disk in a RAID array.  The RAID array continues to
operate with the failed disk installed.  Now the disk is removed and a new one is installed.
  Access to the RAID array is not blocked while the RAID rebuilds the data on the missing
disk.  

It would be real useful for Derby to operate similarly whereby the replication database can
be rebuilt in the background.  Note that while this is being done the replication is degraded
(not operating of course with the current one-to-one replication) just is a RAID array is
while the disk is being resilvered, but once this process is done, then the replication is
up and running.

-----Original Message-----
From: Rick Hillegas [mailto:rick.hillegas@oracle.com]
Sent: Tuesday, January 14, 2014 11:40 AM
To: derby-dev@db.apache.org
Subject: Re: Question on recoverying after replication break because of a system failure

Hi Brett,

I'm afraid that I'm not following your proposal. Some comments inline...

On 1/10/14 1:45 PM, Bergquist, Brett wrote:
>
> The reason I am posting to the dev list is that I might want to look 
> into improving Derby in this area.
>
> Just so that I am understand correctly, the steps for replication are:
>
> *Make a copy of the database to the slave
>
This seems to be the expensive step which results in long downtime.
>
> *Start replication on the slave and on the master
>
> Now assume that this is working right along and all is well and then 
> the system with the master fails.   So replication is broke and then 
> the slave can be restarted in non-replication mode.   Time goes along 
> and changes are made to the non-replicated database on the slave.   
> Finally the master machine is brought back on line.
>
> So to get replication going we need to:
>
> *Copy the database from the slave to the master
>
> *Start replication on the slave and on the master
>
> This assumes that we have an affinity for having the master being the 
> master but even if this is not the case and the old slave is going to 
> become the new master, we need to copy the database from the slave to 
> the master before starting replication again.
>
> Given a database that is fairly large (say on the order of 200Gb) and 
> not a Gig connection between the master and slave, this could be a
> fairly long time for the transfer to occur.   Unfortunately during 
> this transfer time, neither database can be used.    So while 
> replication allows quick fail over in an initial failure, 
> re-establishing the replication when the failure has been resolved can 
> cause a substantial long downtime.
>
> So my question, is there any way that this downtime can be reduced?   
> Could something be done with restoring a backup database and use the 
> logs and then enable replication.     Something like:
>
> *Make a file system level backup of the slave (using something like 
> freeze and ZFS snapshot, this can take only a couple of seconds) and 
> then allow the slave to continue
>
> oAssuming that the database logs are being used so that they can be 
> replayed later
>
> *Transfer the database to the master
>
I don't understand how this step is different from the expensive step you want to eliminate.

Thanks,
-Rick
>
> *Transfer the logs
>
> oReplay each log on the master somehow to get the master to catch up 
> to the slave as close as possible
>
> *Stop the slave so that it becomes consistent
>
> *Transfer the last log to the master and replay the master log
>
> *Enable replication on the master and the slave
>
> Basically limiting the downtime while the database transfer and log 
> file transfer is taking place and then to have a small window of down 
> time where they databases need to become in sync and then replication 
> can be started again.
>
> Any thoughts on this?   Is this an approach that is worth looking at?
>


Mime
View raw message