db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Matrigali (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DERBY-5284) A derby crash at exactly right time during a btree split can cause a corrupt db which can not be booted.
Date Sat, 18 Jun 2011 17:21:47 GMT

    [ https://issues.apache.org/jira/browse/DERBY-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051562#comment-13051562
] 

Mike Matrigali commented on DERBY-5284:
---------------------------------------

I found this problem by code inspection and have not reproduced it myself.  This problem is
another
code path of the same problem fixed by DERBY-5258.  I believe that either DERBY-5258 or this
issue are causing the problems that have been reported as DERBY-5281 and DERBY-5248.  See
those 2 issues for detailed description of order of log records and crash timing necessary
to 
reproduce this problem.

At high abstract level the current code does:

get latch
purge row
release latch
close table
commit

If another transaction gets the latch and inserts rows between the release latch and the commit
and the system crashes before 
the commit then this problem can happen.  The fix is to not release the latch, and let commit
release it.

> A derby crash at exactly right time during a btree split can cause a corrupt db which
can not be booted.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-5284
>                 URL: https://issues.apache.org/jira/browse/DERBY-5284
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.1.3.1, 10.2.2.0, 10.3.3.0, 10.4.2.0, 10.5.3.0, 10.6.1.0, 10.7.1.1,
10.8.1.2
>            Reporter: Mike Matrigali
>            Assignee: Mike Matrigali
>
> A derby crash at exactly wrong time during a btree split can cause a corrupt db which
can not be booted.
> A problem in the split code and exact wrong timing of a crash can leave the database
in as state 
> where undo of purge operations corrupts index pages during redo and can cause recovery
boot
> to never succeed and thus the database never to be booted.  At hight level what happens
is that
> a purge happens on a page and before it commits another transactions uses the space of
the
> purge to do an insert and then commits, then the system crashes before the purging transactions
> gets a chance to commit.  During undo the purge expects there to be space to undo the
purge
> but there is not, and it corrupts the page in various ways depending on the size and
placement
> of the inserts.  The error that actually returns to user varies from sane to insane as
the problem
> is actually noticed after the corruption occurs rather than during the undo.
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message