db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Olav Sandstaa <Olav.Sands...@Sun.COM>
Subject Re: [jira] Commented: (DERBY-2254) Assert during log file switch: log file position exceeded max log file size
Date Tue, 27 Feb 2007 11:56:57 GMT

Thanks for the information about log switches and checkpoints. I had to 
use this server for some other work, but now I have restored the crashed 
Derby database and started to look into the failing log files. See 
comments inline.

Suresh Thalamati wrote:
> Olav Sandstaa (JIRA) wrote:
>>     [ 
>> https://issues.apache.org/jira/browse/DERBY-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466475

>> ]
>> Olav Sandstaa commented on DERBY-2254:
>> --------------------------------------
>> Mike, thank you for commenting on this and for explaining what you
>> think are the issues behind this problem. I agree that the best
>> solution seems be to fix this so that the background thread can do
>> log file switching without waiting for the checkpoing to complete.
> checkpoint and log switch can run in parallel (NOT sure we ever tested 
> this case though ). I think problem here is we are dumping both the 
> requests on to the same thread. It is more of performance issues.
> May be in the future log switches/checkpoints/... should be processed 
> on different threads, when cache is configured with large size.
> I think some log size calculations are wrong, if this bug is fixed 
> Derby  should still work by correctly switching log file before 
> hitting the MAX limit.
>> Some questions somebody maybe know the answer to:
>>   -What is the reason for most of the log files having the default
>>    size of 1 MB but a few (4) log files grow to 268 kB before
>>    switching?
> Default log file size to start a Log switch is 1MB and to perform a 
> checkpoint is 10MB. Normally log switch requests are processed by 
> rawStoreDaemon(checkpoint Daemon) thread without blocking the user 
> threads for switch to complete.
> Log switch happens is following three scenarios:
> 1) On log flush call if it finds current log file size > configure log 
> file size,  then log switch request is placed on the rawstore daemon 
> thread.
> 2) As part of checkpoint ,  log switch is done before performing the
>    checkpoint if  log file size > configured log file size.
> 3) While writing log records if  it finds the log file size has
>    reached the MAX_LIMIT.
> 4) During backup, to backup full log files instead of partial portions.
> By looking at the log file sizes you posted my guess is LOG SWITCH
> requests are getting struck behind the checkpoint request. I would
> think with 18GB cache, checkpoint must be taking long time.
> Another case log file size grow bigger than configured size is  if the
> transactions are  long  and nothing in the database triggered a flush 
> call. I am guessing that is not case here , your application must be 
> doing frequent commits; otherwise it is unlikely that there are 1M log 
>  files.

You are right. The test produces small transactions consisting of only 
five operations. So it is not the length of the transactions that make 
the log files grow very large.

>>   -The crash occurs after having produced about 1 GB of log. Is there
>>    a hard limit on 1 GB before the checkpoint must be completed?
> I have not seen any such limit. Unless you have changed the defaults 
> checkpoint is requested every  10MB , unless checkpoint is already in 
> progress.  I think the  checkpoint are taking long time, parallel 
> activity is producing lot of log.   By the way how did you come up
> with 1GB number ?  

I have not changed any defaults. The reason for asking if there was 
something related to 1 GB of log was probably due to I misunderstood a 
comment Mike made (qoute from  DERBY-2254: "my best guess is that the 
act of switching the log file at 1 gig has become stuck behind finishing 
the checkpoint").

> I see only  few 1MB log files and huge files in
> between (268 MB)...  My guess is checkpoint is is in progress when
> those large files are being created.
> If you want to confirm this hypothesis, you can dump the whole log on 
> recovery, or just modify the code to print only checkpoints from the 
> log. you can dump the log by setting the following flags in the debug 
> code line :
> derby.debug.true=DumpLogOnly
> derby.storage.logDumpStart=49

I have tried running with these settings and the resulting 3.2 GB 
derby.log file contains the dump of about 16 million log records. The 
last entries in the log are:

DEBUG LogTrace OUTPUT: scanned 12632271 : EndXact null Committed : 
transactionStatus =  Committed instant = (1211,268435242) logEnd = 
(1211,268435277) logIn at 15 available 4
DEBUG LogTrace OUTPUT: scanned 12632266 : Page Operation: 
Page(33,Container(0, 960)) pageVersion 176243 : Update  Slot=8 
recordId=14 instant = (1211,268435277) logEnd = (1211,268435346) logIn 
at 24 available 29
DEBUG LogTrace OUTPUT: scanned 12632266 : EndXact null Committed : 
transactionStatus =  Committed instant = (1211,268435346) logEnd = 
(1211,268435381) logIn at 15 available 4
DEBUG LogTrace OUTPUT: scanned Null : Checksum Operation  algorithm = 1 
value = 2138849209 data length= 38 instant = (1211,268435381) logEnd = 

As you suggested, I tried to look for any sign of whether a checkpoint 
was in progress or not. But from the dump of the log I can not find any 
sign of checkpoints in the entire log. Do you have any suggestions for 
how a checkpoint would identify itself in the log dump?

If I try to categorize the entries in the log dump they fall into the 
following types (with the number of occurrences  in the first column):

11422649   Page Operation:
2340723    BeginXact
2340734    EndXact
420947     Checksum Operation
29         detected proper log end on log file
29         switched to next log file

None of these seems to indicate that a checkpoint have started?


View raw message