db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: Discussion of incremental checkpointing
Date Tue, 07 Feb 2006 21:00:23 GMT
I am hoping maybe someone else will get in this discussion, I
am not seeing the benefit vs runtime cost of the incremental checkpoint
approach.
It could be I am blinded by the systems that I have worked on.  I
just have this gut reaction to adding another queue to the I/O
system, going foward I would rather see Derby more parallel and
a single queue seems to be the wrong direction.

Again what problem are you trying to solve? My guesses:
1) avoid checkpoint I/O saturation (I think this needs to be done but
  this can be done in current checkpoint scheme).
2) reduce number of redo recovery log records (how important is this,
  at cost to runtime list manipulation)
3) some other problem?

I think you are trying to solve 2 - which I admit I don't see as much
of a problem.  Currently at a high level (ignoring details) we do:
1) start checkpoint, note current end of log file (LOGEND)
2) we should slowly write pages all dirty pages in cache (I agree we
   need a fix in this area to current scheme)
3) when done write checkpoint log record indicating REDO mark at LOGEND,
   now log may be at LOGEND + N

I think what you want is at step 2 to somehow write multiple checkpoint
log records rather than wait to the end.  Let's assume previous REDO
MARK was LOGEND_PREV.  So while writing the pages you would write a new
type of checkpoint record that would move the REDO mark up to somewhere
between LOGEND_PREV and current end of log file.  I think I agree that
if you wrote a checkpoint record for every I/O from your ordered list
then you would guarantee minimum redo recovery time, where the current
system writes one log record at end of all I/O's which at the end of
writing all page would match your approach (a little hard to compare
as your approach is continuous but if you just compare the dirty page
list at LOGEND I think this is true).

So again, let's first talk about what you want to solve rather than
how to solve it.  Maybe you have some assumptions about what type of
system you are trying to address, like:  size of cache, percentage of
dirty pages, how long it takes to do a checkpoint vs. recovery time
requirements of the applications?

Raymond Raymond wrote:

> 
>> From: Mike Matrigali <mikem_app@sbcglobal.net>
>>
>> I think this is the right path, though would need more details:
>> o does boot mean first time boot for each db?
>> o how to determine "this machine"
>> o and the total time to run such a test.
>>
>> There are some very quick and useful tests that would be fine to
>> add to the default system and do one time per database    Measureing
>> how long to do a commit and how long to do a single database read from
>> disk would be fine.  Seems like
>> just these 2 numbers could be used to come up with a very good
>> default estimate of log recovery time per log record.  Then as you
>> propose the actual estimate can be improved by meauring real
>> recovery time in the future.
>>
>> I am not convinced of the need for the bigger test, but if the default
>> is not to run it automatically and it is your "itch" to have such
>> a configuration option then I would not oppose.  I do see great value
>> in coming up with a very good default estimate of recovery time estimate
>> based on outstanding number of log records.  And
>> I even envision
>> a framework in the future where derby would schedule other non-essential
>> background tasks that have been discussed in the
>>
>> On a different track I am still unclear on the checkpoint dirty page
>> lru list.  Rather than talk about implementation details, I would
>> like to understand the problem you are trying to solve.  For instance
>> I well understand the goal to configure checkpoints such that they
>> map to user understandable concept of the tradeoff of current runtime
>> performance vs. how long am I willing to wait the next time I boot
>> the database after a crash.
>>
>> What is the other problem you are looking at.
>>
> 
> 
> Mike:
> 
> What I am looking at next is to redesign the checkpointing process.
> The current checkpointing mechanism will write out all the dirty pages
> during the checkpoint. That causes a burst of disk I/O. Lots of problems
> were mentioned by some people, such the DERBY-799 reported by Oystein.
> I have a proposal of incremental checkpointing. I have mentioned it
> before,I would like to explain it in more detail.
> 
> We should find some way to sort the dirty pages in ascending order of
> the time when they were firt updated.The incremental checkpointing
> process will continually write out the dirty pages from the earliest
> updated dirty page to the latest updated dirty page. The writing rate
> is related to the system situation.
> There are two situations in which we will update the log control file:
> 1)A data reads or a log writes start to have a longer response time
> then an acceptable value, we update the log control and sleep for a
> while.
> 2)After writing out a certain number of dirty pages
> 
> The benefits of it are :
> 1)since we wirte out dirty pages from the earliest updated page to the
> latest updated page, the checkpoint instance will keep advancing.Since
> the incremental checkpoint is performed continuously, the checkpoint
> instance will be much closer to the tail of the log than the conventional
> checkpointing.
> 2)the checkpointing process can be paused if the disk I/O becomes really
> busy, and the finished part is an intact checkpoint instance.
> 
> Do you still remember I suggested to establish a establish a dirty page
> list in wich dirty pages are sorted in ascending order of the time when
> they were firt updated? I would like to discuss on it again.
> 
> Actually the list is not designed to speed up the checkpoint process. It
> is for the incremental checkpointing described above.To make the checkpoint
> instance keep advancing, We should guarantee the earlier updated pages have
> been written out.That's why I suggested to establish such a list.
> 
> In the last disucssion, you also mentioned a problem:
> 
> MM  The downside with the
> MM  current
> MM  algorithm is that a page that is made dirty after the checkpoint
> MM  starts
> MM  will be written, and if it gets written again before the next
> MM  checkpoint
> MM  we have done 1 too many I/O's.  I think I/O optimization may benefit
> MM  more by working on optimizing the background I/O thread than working
> MM  on the checkpoint.
> 
> 
> If the background I/O thread can refer to this list.I think it can help
> solve the problem you mentioned. I am not very familiar with the background
> I/O thread. If I am wrong, please point it out.
> 
> In the list, the dirt pages are sorted in ascending order of the time when
> they were firt updated, which means the oldest dirty page is in the head of
> the list and the latest updated dirty page is in the end of the list.
> The operations on the list are :
> - When a page is updated and it is not in the list, we will append it to
> the end of the list.
> - When a dirty page in the list is written out to disk, it will be released
> from the list.
> 
> Let's look into your problem:
>  if a page is made dirty after the checkpoint starts,
> 
> 1) if the page was made dirty before this update, it was supposed to be
>  in the list already.We don't need to add it again.
>  When the checkpoint process writes this dirty page out to disk, it will
>  be released from the list, and if the background I/O thread refer to
>  the list, it will know it's no need to write this page out again.
> 2) if the page was first time updated. It will be appended to the end
>  of the list.If the background I/O thread refer to the list, it knows
>  it's no need to write this page out so soon since it has just been
>  updated.
> 
> 
> Is it resonable?
> 
> 
> Raymond
> 
> _________________________________________________________________
> Take advantage of powerful junk e-mail filters built on patented
> Microsoft® SmartScreen Technology.
> http://join.msn.com/?pgmarket=en-ca&page=byoa/prem&xAPID=1994&DI=1034&SU=http://hotmail.com/enca&HL=Market_MSNIS_Taglines
>  Start enjoying all the benefits of MSN® Premium right now and get the
> first two months FREE*.
> 
> 

Mime
View raw message