db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suresh Thalamati <suresh.thalam...@gmail.com>
Subject Re: [jira] Commented: (DERBY-239) Need a online backup feature that does not block update operations when online backup is in progress.
Date Wed, 26 Oct 2005 08:36:15 GMT
Hi Øystein,

Thanks for reviewing the patch. My answers are in-line...

Øystein Grøvlen (JIRA) wrote:
>     [ http://issues.apache.org/jira/browse/DERBY-239?page=comments#action_12355620 ]

> Øystein Grøvlen commented on DERBY-239:
> ---------------------------------------
> I have looked at the onlinebackup_1.diff patch, and overall the patch
> looks good.  However, I think there is one severe bug in
> RAFContainer.java.  I guess due the cut and paste from writePage(),
> you are seeking in another file than the one you are writing
> to (fileData.seek(), but backupRaf.write()).

filedata.seek() in the backup routine is wrong, I will fix it. Good Catch.

> I have a few other comments and questions:
> General:
>     * If I run backup with this patch, it seems like I will run the
>       new code.  Does one not need to change the restore code to be
>       able to restore restore such a backup, or does that the
>       ordinary recovery handle that?

Backup structure is not changed, so current file restore code should 
work ok. Some of the new issues related to online backup  are log 
replay issues like, redoing CREATE CONTAINER, new page creation ..etc. 
I think most of these issues are addressed as part of the  rollforward 
recovery. I plan to rescan the restore code and do some more testing 
to find any cases that are not handled  by the current code.

> RawStore.java:
>     * The BACKUP_FILTER now contains so much, that it would be useful
>       to have a comment that says what is really left to copy.

I agree, comments will be helpful here. I am not sure BACKUP_FILTER 
will be be needed any more. Currently it just does not filter JAR dir, 
I plan to  make JARS also as separate copy after the data segment as 
discussed in my previous e-mail.

>     * Intuitively, it seems wrong to hard-code "seg0", but I see that
>       this is done all over the code.  

Could you please file a JIRA entry for this one. I noticed it too, 
hard coding is definitely not the correct way here. One of us can 
clean up this.

> Will there always be only one   segment?  

Yes, at least for now. It is hard wired some where in the access layer 
I think.

> What is then the purpose of the segment concept?

I think initial design plan was to support some kind of data 
partitioning by storing fragments of tables/indexes on different 
segments or distribute tables into different ssegments. Each segment 
then can be placed on on different disks for performance reasons or to 
avoid derby database size being limited to a single disk/volume.

>     * backup(File ...) seems like it would create an endless recursion
>       if called.  Fortunately, it seems like it never will be
>       called. Why do we need the methods with a File parameter instead
>       of a String.  The system procedures uses the String variant.
 >       Maybe we could just remove the File variant?

I think they are left overs from the old cloudscape versions, File 
variant is not supported any more , these can be cleaned up. There 
should be some File variant calls in access layer also, Could you 
please file Jira for this cleanup, so that these won't  forgotten again.

>     * I think it would be helpful with a comment that explained why
>       disableLogArchiveMode() was made synchronized.

Yes, I definitely need to add comments about this synchronization.
This was part of my attempt to prevent parallel backup actions, which 
I did not complete as part of this patch.

> BaseDataFileFactory.java:
>     * I do think basing which files to back up on the contents of the
>       seg0 directory is very robust.  What if someone by accident has
>       written a file with a name that matches the pattern you are
>       looking for.  Then I would think you may get a very strange
>       error message that may not be easy to resolve.  Could not this
>       be based on some system catalog? 

I think users should not be creating/deleting any files in SEG0 to 
start with. Pattern matching is just a simple precaution. I thought of 
scanning the catalogs to find the container to backup, but could not 
convince my self that it is required at least for now.

If you think scanning the catalogs is more robust, this enhancement 
can be done at later point of time easily.

>  Another scenario is if someone
>       by accident deletes a file for a table that is not accessed very
>       often.  A later backup will then not detect that this file is
>       missing.  Since the backup is believed to be succesful, the
>       latest backup of this file may be deleted.

Backup is as good as the database in this case :-)  If users access 
the deleted file table from the backup DB , it will fail with NO 
container error, as it will on the main database.

By using the catalog approach also backup can only throw a warning
about the  deleted container. Once the user deletes a container file ,
there is no way out, users can not even drop that table. I don't think
making backup fail forever when this scenario occurs will be 
acceptable to the users.

> FileContainer.java:
>     * I cannot find any backup-specific about getPageForBackup() so I
>       think a more general name would be better (e.g, getLatchedPage).

Yes, currently this routine does not  have any backup specific stuff 
in this routine. I would like to get the page from the cache with 
specific weight for the backup page in future, when the weight based 
cache mechanism is implemented.

> RAFContainer.java:
>     * The changes to this file seems not to be quite in line with some
>       of the original design philosophies.  I am not sure that is
>       necessarily bad, but it would be nice to here the arguments for
>       doing it this way.  More specifically:
>           - While RAFContainer so far has used the
>             StorageRandomAccessFile/StorageFile abstractions, backup
>             use RandomAccessFile/File directly.  Is there a particular
>             reason for that?

Yes. Currently there is No API available for the users to specify the 
type of Storage Factory to use for the backup. It will be good 
enhancement to add later.

Online backup implementation is doing what the current backup does;
i.e use the disk IO calls directly , instead of the database Storage 
Factory. I think  one reason to do this way is if a database is using 
in-memory Storage Factory, users can backup the database on to a disk. 
If Storage Factory used by the Database is java.io.* , then backup is 
doing the same without going through the Storage Factory interface.

>           - In order to be able to backup a page, new methods have
>             been added to FileContainer and BasePage/CachedPage, and
>             the RAFContainer is doing latching of pages.  This
>             increases coupling to other modules, have alternative
>             designs been considered?

I was considering doing following minor changes here:

1) Instead of getting pageData byte array from the Page Object, make
a request to the page to write it self to backup(like page.backup(....).

2) Move the backing up and unlatching of the page code to the 

I have not thought of any other better ways to do design this, any 
ideas will be helpful.

>     * privBackupContainer(): 
>           - Setting 'done=true' at the start of the method is a bit
>             confusing.  I would think another name for this variable
>             would be better.

I will change it to 'backupComplete = false' and reverse the exit 
logic, hopefully that will make it more readable.

>           - If an exception is thrown while holding a latch, do one
>             not need to relase the latch?

Yes. Latch has to be released. I will make sure no latches are held 
when an error occurs

>     * writeToBackup(): 
>           - copies a lot of code from writePage().  One should
>             consider factoring out common code.

I will move the common code into a separate routine.

>           - Due the cut and paste, you are seeking in another file
>             than the one you are writing to. (fileData.seek(), but
>             backupRaf.write())

you are right, I will fix it.

> LogToFile.java:
>     * In getFirstLogNeeded why did you need to change
>       getFirstLogNeeded() to handle a null checkpoint?  Is it in case
>       you do backup before any checkpoint has been performed?

    No. There will be a checkpoint before the backup. I added NULL 
check for testing drop table stub cases without the checkpoint and I 
also noticed a NULL pointer error when shutting down on corrupt cases 
if a checkpoint not occurred by that time. so thought there is no harm 
in adding a NULL check there.

>     * The javadoc for startLogBackup() says that 'log files are copied
>       after all the data files are backed up, but at the end of the
>       method log files are copied.
> ReadOnly.java/InputStreamContainer.java:  
>     * I do not think javadoc should just be a copy of the
>       interface/superclass, but say something about what is particular
>       to this implementation.  In this case, the Javadoc should say
>       that nothing is done.
> typos:
>     * several occurences of 'backedup'
>     * LogToFile.java: 'eventhough'
>     * RAFContainer.java: 'pahe cache', 'conatiner'
>     * 'Standard Cloudscape error policy'.  Should this be changed to
>       Derby?

I will correct the comments.

> Pet peeves:
>     * Should have Javadoc on all methods, parameters and return
>       values.
>     * Lines longer than 80 chars.

Sound like a good ones to follow. I will add the java doc and fix any 
long lines.


View raw message