From derby-user-return-9904-apmail-db-derby-user-archive=db.apache.org@db.apache.org Mon Oct 13 08:27:26 2008 Return-Path: Delivered-To: apmail-db-derby-user-archive@www.apache.org Received: (qmail 87204 invoked from network); 13 Oct 2008 08:27:26 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 13 Oct 2008 08:27:26 -0000 Received: (qmail 39970 invoked by uid 500); 13 Oct 2008 08:27:25 -0000 Delivered-To: apmail-db-derby-user-archive@db.apache.org Received: (qmail 39946 invoked by uid 500); 13 Oct 2008 08:27:25 -0000 Mailing-List: contact derby-user-help@db.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: List-Id: Reply-To: "Derby Discussion" Delivered-To: mailing list derby-user@db.apache.org Received: (qmail 39935 invoked by uid 99); 13 Oct 2008 08:27:25 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Oct 2008 01:27:25 -0700 X-ASF-Spam-Status: No, hits=3.2 required=10.0 tests=FS_REPLICA,HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [66.192.107.220] (HELO bonito.coloflorida.com) (66.192.107.220) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Oct 2008 08:26:18 +0000 Received: from mackerel.coloflorida.com (66.192.107.221) by bonito.coloflorida.com (66.192.107.220) with Microsoft SMTP Server (TLS) id 8.1.291.1; Mon, 13 Oct 2008 04:28:17 -0400 Received: from EVS-RED.coloflorida.com ([66.192.107.214]) by mackerel.coloflorida.com ([66.192.107.221]) with mapi; Mon, 13 Oct 2008 04:28:17 -0400 From: Andrew Lawrenson To: Derby Discussion Date: Mon, 13 Oct 2008 04:26:51 -0400 Subject: Replication - switching back after failover Thread-Topic: Replication - switching back after failover Thread-Index: AcktDXEUXLnNytRES1Szuw4mAl3s9g== Message-ID: <882C3355DFF9D3468379DD1E8C115FC7148B964570@EVS-RED.coloflorida.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_882C3355DFF9D3468379DD1E8C115FC7148B964570EVSREDcoloflo_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_882C3355DFF9D3468379DD1E8C115FC7148B964570EVSREDcoloflo_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi again, I'm using Derby Replication at the moment (with 4.3.2.0), and I'm after= some advice on how to "fail back" after a fail-over has occurred. The scenario I'm looking at is having a Primary Server, which is normally t= he database master, and a Secondary Server, in a different location, which = is the slave. If the primary fails, it will all switch over to the Secondary ok. What I'= m wondering about is the best way to switch back to the original setup, of = the Primary being the Master & the Secondary being the Slave again. As I u= nderstand it, Derby does nothing automatic do to this - I have to do it mys= elf. I can do this fine, but I need to try & minimise any downtime when = the system is not available - and as the servers are in different locations= , there is only limited bandwidth between the two, so copying the database = back & forth may take some time. So, the easiest way to do this is to stop both databases, copy the database= from the secondary to the primary, then restart the replication. However,= this would require copying the database over the network twice whilst the = system is down (once to transfer from slave to master, and again when start= ing replication). I can improve on this by backing-up the secondary, transferring over the ne= twork to the primary, then shutting down the two, copying the transaction l= ogs since the backup from the seconday to the primary, the doing a restore,= then restarting replication. This will only require only one big copy ove= r the network whilst the servers are unavailable. However, main query is regarding the need to copy the database from the mas= ter to the slave when starting replication. The docs state: "Before you start replication, you must boot the master database and then c= opy the database to the slave location. To ensure that the master database = is not modified between the time you start the file-system copy and the tim= e you start replication, you must freeze the master database..." In this scenario, where you've just copied the slave to the master, to copy= it back again seems a little excessive - is there any way to "streamline" = this, if you know that the two databases are the same? (e.g. just copying t= he logs, but not the main database files). If something is possible, does = it make any difference whether you do a straight binary copy of the slave d= atabase, versus doing a restore? (which should create a logically identical= database, but may not be the same exact binary files). many thanks in advance for any advice, Andrew Lawrenson --_000_882C3355DFF9D3468379DD1E8C115FC7148B964570EVSREDcoloflo_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable
Hi=20 again,
 
 &nb= sp; =20 I'm using Derby Replication at the moment (with 4.3.2.0), and I'm after som= e=20 advice on how to "fail back" after a fail-over has occurred.<= /DIV>
 
The scena= rio I'm=20 looking at is having a Primary Server, which is normally the database maste= r,=20 and a Secondary Server, in a different location, which is the=20 slave.
 
If the pr= imary=20 fails, it will all switch over to the Secondary ok.  What I'm wonderin= g=20 about is the best way to switch back to the original setup, of the Primary = being=20 the Master & the Secondary being the Slave again.  As I understand= it,=20 Derby does nothing automatic do to this - I have to do it=20 myself.    I can do this fine, but I need to try & minim= ise=20 any downtime when the system is not available - and as the servers are in=20 different locations, there is only limited bandwidth between the two, so co= pying=20 the database back & forth may take some time.
 =
So, the e= asiest way=20 to do this is to stop both databases, copy the database from the secondary = to=20 the primary, then restart the replication.  However, this would requir= e=20 copying the database over the network twice whilst the system is down (once= to=20 transfer from slave to master, and again when starting=20 replication).
 
I can imp= rove on=20 this by backing-up the secondary, transferring over the network to the prim= ary,=20 then shutting down the two, copying the transaction logs since the backup f= rom=20 the seconday to the primary, the doing a restore, then restarting=20 replication.  This will only require only one big copy over the networ= k=20 whilst the servers are unavailable.
 
However, = main query=20 is regarding the need to copy the database from the master to the slav= e=20 when starting replication.  The docs state:
 
"Before you start replication, you must b= oot the=20 master database and then copy the d= atabase=20 to the slave location. To ensure that the master database is not modified between the time you start the file-syst= em copy=20 and the time you start replication, you=20 must freeze the master database..."
 
In this scenario, where you've just copied=20 the slave to the master, to copy it back again seems a little excessiv= e -=20 is there any way to "streamline" this, if you know that the two databases a= re=20 the same? (e.g. just copying the logs, but not the main database=20 files).  If something is possible, does it make any difference wh= ether=20 you do a straight binary copy of the slave database, versus doing a restore= ?=20 (which should create a logically identical database, but may not be the sam= e=20 exact binary files).
 
many thanks in advance for any=20 advice,
 
Andrew=20 Lawrenson
--_000_882C3355DFF9D3468379DD1E8C115FC7148B964570EVSREDcoloflo_--