Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm
Precedence: bulk
Reply-To: <derby-dev@db.apache.org>
Message-ID: <778178074.1130595719375.JavaMail.jira@ajax.apache.org>
Date: Sat, 29 Oct 2005 16:21:59 +0200 (CEST)
From: "Mike Matrigali (JIRA)" <derby-dev@db.apache.org>
To: derby-dev@db.apache.org
Subject: [jira] Commented: (DERBY-662) during crash recovery of a drop table,
 on case insensitive files systems derby may delete wrong file
In-Reply-To: <1511240961.1130595595681.JavaMail.jira@ajax.apache.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

    [ http://issues.apache.org/jira/browse/DERBY-662?page=comments#action_12356270 ] 

Mike Matrigali commented on DERBY-662:
--------------------------------------

I will be submitting a fix shortly, it took me a lot longer to figure out the scenario to cause the bug than the fix.  The problem is a rarely used  path through store that does not do the conglomerate number to hex  conversion.

> during crash recovery of a drop table, on case insensitive files systems derby may delete wrong file
> ----------------------------------------------------------------------------------------------------
>
>          Key: DERBY-662
>          URL: http://issues.apache.org/jira/browse/DERBY-662
>      Project: Derby
>         Type: Bug
>   Components: Store
>     Versions: 10.1.1.1
>  Environment: jvm/os/filesystem where file names are case insensitive such that delete of C2080.dat will remove c2080.dat if it exists.
>     Reporter: Mike Matrigali
>     Assignee: Mike Matrigali
>     Priority: Blocker

>
> Sometimes during redo the system will incorrectly remove the file associated
> with a table.  The bug requires the following conditions to reproduce:
> 1) The OS/filesystem must be case insensitive such that a request to delete
>    a file named C2080.dat would also remove c2080.dat.  This is true in
>    windows default file systems, not true in unix/linux filesystems that
>    I am aware of.
> 2) The system must be shutdown not in a clean manner, such that a subsequent
>    access of the database causes a REDO recovery action of a drop table
>    statement.  This means that a drop table statement must have happened
>    since the last checkpoint in the log file.  Examples of things that cause
>    checkpoints are:
>    o clean shutdown from ij using the "exit" command
>    o clean shutdown of database using the "shutdown=true" url
>    o calling the checkpoint system procedure
>    o generating enough log activity to cause a regularly scheduled checkpoint.
> 3) If the conglomerate number of the above described drop table is TABLE_1,
>    then for a problem to occur there must also exist in the database a table
>    such that it's HEX(TABLE_2) = TABLE_1
> 4) Either TABLE_2 must not be accessed during REDO prior to the REDO operation
>    of the drop of TABLE_1 or there must be enough other table references during
>    the REDO phase to push the caching of of the open of TABLE_2 out of cache.
> If all of the above conditions are met then during REDO the system will
> incorrectly delete TABLE_2 while trying to redo the drop of TABLE_1.
> <p>
> I will be adding the following test to reproduce the problem:
> 1) create 500 tables, need enough tables to insure that conglomerate number
>    2080 (c820.dat) and 8320 (c2080.dat) exist.
> 2) checkpoint the database so that create does not happen during REDO
> 3) drop table with conglomerate number 2080, mapping to c820.dat.  It looks
>    it up in the catalog in case conglomerate number assignment changes for
>    some reason.
> 4) exit the database without a clean shudown, this is the default for test
>    suites which run multiple tests in a single db - no clean shutdown is done.
>    Since we only do a single drop since the last checkpoint, test will cause
>    the drop during the subsequent REDO.
> 5) run next test program dropcrash2, which will cause redo of the drop.  At
>    this point the bug will cause file c2080.dat to be incorrectly deleted and
>    thus accesses to conglomerate 8320 will throw container does not exist
>    errors.
> 6) check the consistency of the database which will find the container does
>    not exist error.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira