db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suresh Thalamati (JIRA)" <derby-...@db.apache.org>
Subject [jira] Commented: (DERBY-96) partial log record writes that occur because of out-of order writes need to be handled by recovery.
Date Fri, 11 Feb 2005 19:18:14 GMT
     [ http://issues.apache.org/jira/browse/DERBY-96?page=comments#action_59041 ]
     
Suresh Thalamati commented on DERBY-96:
---------------------------------------


Conclusion was to solve this problem by writing a checksum log record before writing  the
log buffer and verify the checksum
during recovery. 
 

I don't know how to link derby dev list e-mail to zira. just
doing  copy/paste of comments from e-mail list. 
Mike Matrigali wrote:


>>I think that some fix to this issue should be implemented for the next
>>release.  The order of my preference is #2, #1, #3.
>>  
>>


I believe option #2 (checksuming log recods in the log buffers before
writing to the disk)  is a good fix for this problem.
If there are no objectiions to this approach,  I will start to work on
this.


-suresht



>>I think that the option #2 can be implemented in the logging system and
>>require very little if no changes to the rest of the system processing
>>of log records.  Log record offsets remain efficient, ie. they can use
>>LSN's directly.  Only the boot time recovery code need look for the
>>new log record and do the work to verify checksums, online abort is
>>unaffected.
>>
>>I would like to see some performance numbers on the checksum overhead
>>and if it is measurable then maybe some discussion on checksum choice.
>>An obvious first choice would seem to be the standard java provided one
>>used on the data pages.  If I had it to do over, I would probably have
>>used a different approach on the data pages.  The point of the checksum
>>on the data page is not to catch data sector write errors, the system
>>expects the device to catch those, the only point is to catch
>>inconsistent sector writes (ie. 1st and 2nd 512 byte sector but not
>>3rd and 4th), for this the current checksum is overkill.  For this one
>>need not checksum every byte on the page,
>>one can guarantee a consistent write with 1 bit per sector in the page.
>>
>>In the future we may want to revisit #3 if it looks like the stream log
>>is an I/O bottleneck which can't be addressed by striping or some other
>>hardware help like smart caching controllers.  I see it as a performance
>>project rather than a correctness project.  It also is a lot more work
>>and risk.  Note that this could be a good project for someone wanting to
>>do some research in this area as it is implemented as a derby module
>>where an alternate implementation could be dropped in if available.
>>
>>While I believe that we should address this issue, I should also note
>>that in all my time working on cloudscape/derby I have never received a
>>problem database (in that time any log related error would have come
>>through me), that resulted from this out of order/imcomplete log
>>write issue - this of course does not mean it has not happened just that
>>it was not reported to us and/or did not affect the database in a
>>noticable way.  We have actually never seen an out of order write from
>>the data pages also - we have seen a few checksum errors but all of
>>those were caused by a bad disk.
>>
>>On the upgrade issue, it may be time to start an upgrade thread.  Here
>>are just some thoughts.  If doing option #2, it would be nice if the
>>new code could still read the old log files and then optionally
>>write the new log record or not.  Then if users wanted to run a
>>release in a "soft" upgrade mode where they needed to be able to
>>go back to the old software they could - they just would not get
>>this fix.  On a "hard" upgrade the software should continue to read
>>the old log files as they are currently formatted, and for any new
>>log files it should begin writing the new log record.  Once the new
>>log record make's it way into the log file accessing the db with the
>>old software is unsupported (it will throw an error as it won't know
>>what to do with the new log record).

> partial log record writes that occur because of out-of order writes need to be handled
by recovery.
> ---------------------------------------------------------------------------------------------------
>
>          Key: DERBY-96
>          URL: http://issues.apache.org/jira/browse/DERBY-96
>      Project: Derby
>         Type: New Feature
>   Components: Store
>     Versions: 10.0.2.1
>     Reporter: Suresh Thalamati
>     Assignee: Suresh Thalamati

>
> Incomplete log record write that occurs because of
> an out of order partial writes gets recognized as complete during
> recovery if the first sector and last sector happens to get written.
>  Current system recognizes incompletely written log records by checking
> the length of the record that is stored in the beginning and end.
>  Format the log records are written to disk is:
>   +----------+-------------+------------------+
>   | length     |  LOG RECORD |    length   |
>   +----------+-------------+------------------+
> This mechanism works fine if sectors are written in sequential manner or
> log record size is less than 2 sectors. I  believe on SCSI types disks
> order is not necessarily sequential, SCSI disk drives may sometimes do a
> reordering of the sectors to optimize the performance.  If a log record
> that spans multiple disk sectors is being written to SCISI type of
> devices,  it is possible that first and last sector written before the
> crash; If this occurs recovery system will incorrectly  interpret the
> log records was completely written and replay the record. This could
> lead to recovery errors or data corruption.
> -
> This problem also will not occur if a disk drive has write cache with a
> battery backup which will make sure I/O request will complete.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


Mime
View raw message