db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bergquist, Brett" <BBergqu...@canoga.com>
Subject Had a problem with SYSCS_FREEZE_DATABASE and am wondering is something can be done make this more robust
Date Mon, 10 Sep 2012 18:07:29 GMT
Because the online backup was taking a long time and effecting performance, and the customer's
system was using the ZFS file system on Solaris.

I wrote a utility that does the following:


1.       Freezes the database

2.       Invokes a system command to perform a ZFS snapshot

3.       Unfreezes the database

4.       Creates a backup of the ZFS snapshot using 'tar' and 'compress'

5.       Removes the ZFS snapshot

The ZFS snapshot takes about 1 or 2 seconds so the time between step 1 and step 3 is a couple
of seconds.    The utility has checks to make sure that if step 1 succeeds that it will do
a step 3.   The basic logic looks like:

   private void run(String[] args) {
        parseArguments(args);
        loadDbDriver();
        final Connection conn = openDatabaseConnection();

        int res = 0;
        try {
            Thread shudownHook = new Thread() {
                @Override
                public void run() {
                    attemptToUnfreezeDatabase(conn);
                }
            };
            Runtime.getRuntime().addShutdownHook(shudownHook);
            freezeDatabase(conn);
            try {
                res = executeCopyCommand();
            } finally {
                unfreezeDatabase(conn);
                Runtime.getRuntime().removeShutdownHook(shudownHook);
            }
        } finally {
            closeDatabaseConnection(conn);
        }

        System.exit(res);
    }

So it registers a shutdown hook and also performs the system level command to perform the
ZFS snapshot in a try/finally block, doing both to ensure that the unfreeze is done if the
freeze was done.    This has been working really well each night for about 2 months but Saturday
night something failed.

>From the stack traces of the Derby engine, it appears that something causes the utility
to fail after the database was frozen and neither the shutdown hook nor the try/finally unfroze
the database.   So after that point, the database was effectively locked up.   The system
was still operating and connections were being made trying to access the  database exhausting
all of the connections.

So I was thinking that maybe the database engine should have some sort of protection if this
were to happen.   Maybe the database engine should automatically unfreeze the database if
the connection that freezes the database terminates/closes.   Or maybe a timer to be added
to the freeze command to automatically unfreeze the database after the fact.

I am thinking this because I was told on a previous emailing when trying to build this utility
totally from a script point of view using IJ to freeze the database, SH to perform the ZFS
snapshot and IJ to unfreeze the database that it was not expected that the freeze/unfreeze
would be done from separate connections.  I fact I ran into a problem with the utility at
that point where the IJ connection to unfreeze could not be created because the database was
frozen.

So I guess is there ever a use case that would require a database to be frozen and not unfrozen
before the connection is closed/lost?



Mime
View raw message