Because the online backup was taking a long time and effecting performance, and the customer’s system was using the ZFS file system on Solaris.

 

I wrote a utility that does the following:

 

1.       Freezes the database

2.       Invokes a system command to perform a ZFS snapshot

3.       Unfreezes the database

4.       Creates a backup of the ZFS snapshot using ‘tar’ and ‘compress’

5.       Removes the ZFS snapshot

 

The ZFS snapshot takes about 1 or 2 seconds so the time between step 1 and step 3 is a couple of seconds.    The utility has checks to make sure that if step 1 succeeds that it will do a step 3.   The basic logic looks like:

 

   private void run(String[] args) {

        parseArguments(args);

        loadDbDriver();

        final Connection conn = openDatabaseConnection();

 

        int res = 0;

        try {

            Thread shudownHook = new Thread() {

                @Override

                public void run() {

                    attemptToUnfreezeDatabase(conn);

                }

            };

            Runtime.getRuntime().addShutdownHook(shudownHook);

            freezeDatabase(conn);

            try {

                res = executeCopyCommand();

            } finally {

                unfreezeDatabase(conn);

                Runtime.getRuntime().removeShutdownHook(shudownHook);

            }

        } finally {

            closeDatabaseConnection(conn);

        }

 

        System.exit(res);

    }

 

So it registers a shutdown hook and also performs the system level command to perform the ZFS snapshot in a try/finally block, doing both to ensure that the unfreeze is done if the freeze was done.    This has been working really well each night for about 2 months but Saturday night something failed.    

 

From the stack traces of the Derby engine, it appears that something causes the utility to fail after the database was frozen and neither the shutdown hook nor the try/finally unfroze the database.   So after that point, the database was effectively locked up.   The system was still operating and connections were being made trying to access the  database exhausting all of the connections.

 

So I was thinking that maybe the database engine should have some sort of protection if this were to happen.   Maybe the database engine should automatically unfreeze the database if the connection that freezes the database terminates/closes.   Or maybe a timer to be added to the freeze command to automatically unfreeze the database after the fact.  

 

I am thinking this because I was told on a previous emailing when trying to build this utility totally from a script point of view using IJ to freeze the database, SH to perform the ZFS snapshot and IJ to unfreeze the database that it was not expected that the freeze/unfreeze would be done from separate connections.  I fact I ran into a problem with the utility at that point where the IJ connection to unfreeze could not be created because the database was frozen.

 

So I guess is there ever a use case that would require a database to be frozen and not unfrozen before the connection is closed/lost?