db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oystein.Grov...@Sun.COM (Øystein Grøvlen)
Subject Re: [jira] Commented: (DERBY-239) Need a online backup feature that does not block update operations when online backup is in progress.
Date Tue, 26 Jul 2005 09:49:15 GMT
>>>>> "ST" == Suresh Thalamati <suresh.thalamati@gmail.com> writes:

    ST> Thanks  for the  input ,  Øystein. My  comments are  in-line. Øystein
    ST> Grøvlen wrote:

    >> I guess the idea is to do a log switch when the data files have been
    >> copied and copy all log from before this log switch.


    ST> That's what I plan to do, except the the log switch will not
    ST> happen until it is time to determine the last log file that
    ST> needs to go into the backup after copying the data files and
    ST> the rest of the log files.  From the user perspective ,

Good idea.  Then the uncertainty of what really has made it into the
backup is limited to just a few seconds before it completed.

    ST> Derby online backup will include all the transactions that
    ST> were committed the backup.  i.e if the backup starts at 9PM
    ST> and ends at 10PM, backup will have data approximately until
    ST> 10PM almost , not 9PM.

    ST> My understanding of what goes into the backup is that , data
    ST> in the backup need not match to the exact real-time. please
    ST> correct if this is wrong assumption.

I agree with you.


    >> One will have to
    >> prevent any log files that is needed from being garbage collected
    >> before they are copied.  I guess that can be achieved by running
    >> backup in a transaction and log the start of backup.  That way, the
    >> open backup transaction should prevent garbage-collection of relevant
    >> log files.
    >> 
    >> 


    ST> I agree it  by writing log record for start of  backup, we can prevent
    ST> garbage-collection of log files.

    ST> My  initial thought  is to  simply disable  garbage-collection  of log
    ST> files for the  duration of the backup. unless  there are some specific
    ST> advantages in writing backup-start log record.

Disabling garabage-collection directly is probably the cleanest way to
do this.

How will you determine where to start the redo scan at recovery?  Do
you need some mark in the log for that purpose?

    >> One question I have is whether compensation log records (CLR) in Derby
    >> are self-contained.  If they depend on the original non-CLR, log
    >> records for transactions that were rolled back and terminated long
    >> before the backup is finished, will be needed to be able to redo the
    >> CLR.
    >> 

    ST> I think CLR's are not self contained in Derby.  CLR log
    ST> records contain, log instant of the original operation log
    ST> record that was rolled back.

If garbage-collection is disabled this does probably not matter, but
this means that a start-of-backup log record is not sufficient to
prevent garbage-collection of relevant log records.


    >> Generally, we cannot give a guarantee that operations that are
    >> performed during backup are reflected in the backup.  If I have
    >> understand correctly, transactions that commits after the data copying
    >> is finished, will not be reflected.  Since a user will not be able to
    >> distiguish between operations committed during data copying and
    >> operations committed during log copying, he cannot be sure concurrent
    >> operations is reflected in the backup.
    >> 
    >> 


    ST> I agree with you that , one can not absolutely guarantee that
    ST> backup will include operations committed till a particular
    ST> time are included in the backup.  But the backup design
    ST> depends on the transactions log to bring the database to
    ST> consistent state , because when data files are being copied ,
    ST> it is possible that some of the page are written to the disk.
    ST> So we need the transaction log until the data files are copied
    ST> for sure. If a user commits a non-logged operation when data
    ST> files are being copied , he/she would expect it to be there in
    ST> the backup, similar to a logged operation.

My point was that a user will not be able to distiguish between the
data file copying period and the log copying period.  Hence, he does
not know whether his operation was committed while the data files was
being copied.

    ST> Please note that non-logging operation in Derby are not
    ST> explicit to the users, most of non-logging stuff is done by
    ST> the system without the user knowledge.

I understand.


    >> This is not more of an issue for a new backup mechanism than it is
    >> currently for roll-forward recovery.  Roll-forward recovery will not be
    >> able recover non-logged operations either.

    ST> Yes. Roll-forward recovery has same issues, once the log
    ST> archive mode that is required for roll-forward recovery is
    ST> enabled all the operations are logged including the operations
    ST> that are not logged normally like create index.  But I think
    ST> the currently derby does not handle correctly . it does not
    ST> force logging for non-logged operations that were started
    ST> before log archive mode is enabled.

The cheapest way to handle non-logged operations that started before
backup/archive mode enabling, is to just make them fail and roll them
back.  I think that would be an acceptable solution.

    >> If users needs that, we
    >> should provide logged version of these operations.
    >> 
    >> 

    ST> I think, during  backup non-logged operations should be  logged by the
    ST> system or block them. 

I think blocking them should be acceptable to most users.

    ST> If user is really concerned of performance they will not
    ST> execute them in parallel.

This advice may work for backup, but not for enabling roll-forward
recovery.  If I was user that was concerned with performance, I think
I would prefer to still create an index unlogged and rather recreate
it if recovery is needed.  (I guess this would require roll-forward
recovery to ignore updates to non-existing indexes.)  I could limit
the vulnerability by making a backup after unlogged operations have
been performed.

By the way, how is normal recovery of unlogged operations handled? Is
the commit of unlogged operations delayed until all data pages created
by the operation have been flushed to disk?

-- 
Øystein


Mime
View raw message