db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suresh Thalamati <suresh.thalam...@gmail.com>
Subject Re: Discussion of incremental checkpointing----Added some new content
Date Fri, 10 Feb 2006 21:25:53 GMT

My two cents:
If the goal is do to auto-tune the checkpoint interval , I think 
amount of log generated is a good indication of how fast the
system is. By finding out how fast log is getting generated,
one can predict with in reasonable error how long the recovery
will take on that system.

It might be worth to find a simple solution to start with, than trying 
to get it perfect. How about some  simple solution to tune the 
checkpoint interval like the following:

Say on a particular system:

1) we can generate N amount of log in X amount time on a running system.

2) R is fraction of time it takes to recover a log generated in X 
amount of time. One can pre calculate this factor by doing some tests 
instead of trying to find on each boot.

3) Y is the recovery time, derby users likely to see by
default in the worst case.

4) C is after how much log generation a checkpoint should be scheduled

ideal checkpoint interval log size should be some thing like :

    C = (N / X) * (Y * R)


For example :

X = 5 min,   N  = 50 MB., Y =  1 min, R = 0.5

C = (50 / 5 ) * ( 1 / 0.5)    = 20 MB.


One could use some formula like above at checkpoint to tune when 
should the next checkpoint should occur , I understand first time it 
will be completely Off,  but the interval will stabilize after a few 
checkpoints.

I think irrespective of whatever approach is finally implemented, we 
have to watch  for overhead introduced to get it exactly right, I 
don't think user will really care if recovery takes few seconds
less or more.

Aside from tuning the checkpoint interval, I think we should find ways 
to minimize the effect of checkpoint on the system throughput.
I guess that is a different topic.



Thanks
-suresht



Raymond Raymond wrote:
> Mike,
> I am sorry, I did not make it very clear what I am going to do.
> I will explain it now. I think the last thing you mentioned is what
> I want to solve :
> 
> MM  how long it takes to do a checkpoint vs. recovery time
> MM  requirements of the applications?
> 
> I am working on the issue of automatic checkpointing which makes
> the derby engine control the checkpointing process by itself depends
> on the runtime situation.  The goal is to make a good balance between
> runtime resource consumption of the checkpointing process (especially
> disk I/O resource), and the recovery time. I want to do checkpointing as
> much as possible while have less interference with the real work of derby.
> (Most of the system resource I mentioned here is the disk I/O resource,
> since as for the checkpointing issue, disk I/O resource is the bottleneck
> over all the system resources)
>  Let's look into the current derby checkpointing mechanism first. If we
> set the checkpointing interval too short, derby will do checkpointing very
> often, which will take lots of disk I/O resource, and the reponses to the
> requests from clients will be delayed. Conversely, if we set the 
> checkpointing
> interval too long, derby will keep lots of data in cache and when crush
> happens, the recovery time will be very long. I am trying to make a good
> balance between them.
>  Then, let me show the benefits of my proposal. My basic idea is to do
> checkpoint as much as possible when disk I/O is not busy. I think the
> incremental checkpointing mechanism combined with consideration of the
> runtime situation( what we discussed last time -- "Discussion of how to map
> the recovery time into Xmb of log", about statistic of system performance
> information, time of a data reads or time of a log writes, etc.) can 
> solve the
> problem.
>  We can imagine that the incremental checkpointing mechanism
> divides the current checkpoint process into several pieces.  We let the 
> user
> to setup an acceptable recovery time.Then, when the system is not busy,
> we do a piece of checkpoint; when the system becomes busy,we suspend
> the checkpoint process for a while, and so on. But if the log is longer 
> then
> the acceptable length, we try to do checkpoint even the system is busy to
> meet the acceptable recovery time set by the user. We make each piece of
> checkpoint an intact checkpointing process by updating the log control 
> file.
> I think what you described in your comments is what the incremental
> checkpointing will do:
> 
> MM  I think what you want is at step 2 to somehow write multiple
> MM  checkpoint log records rather than wait to the end.  Let's assume 
> previous REDO
> MM  MARK was LOGEND_PREV.  So while writing the pages you would write a new
> MM  type of checkpoint record that would move the REDO mark up to somewhere
> MM  between LOGEND_PREV and current end of log file.  I think I agree that
> MM  if you wrote a checkpoint record for every I/O from your ordered list
> MM  then you would guarantee minimum redo recovery time, where the current
> MM  system writes one log record at end of all I/O's which at the end of
> MM  writing all page would match your approach (a little hard to compare
> MM  as your approach is continuous but if you just compare the dirty page
> MM  list at LOGEND I think this is true).
> 
> To establish a dirty page list is just my suggestion. The purpose of 
> that is to
> sort the dirty pages in ascending order of the time when they were first
> updated. If we can have any other ideas to do that without using extra
> memory,we don¡¯t have to establish such a list.
> 
> Anyone has any suggestion on it? Everyone is welcome to give your opinion
> on it.
> 
> 
> 
> Raymond
> 
> 
>> From: Mike Matrigali <mikem_app@sbcglobal.net>
>>
>> I am hoping maybe someone else will get in this discussion, I
>> am not seeing the benefit vs runtime cost of the incremental checkpoint
>> approach.
>> It could be I am blinded by the systems that I have worked on.  I
>> just have this gut reaction to adding another queue to the I/O
>> system, going foward I would rather see Derby more parallel and
>> a single queue seems to be the wrong direction.
>>
>> Again what problem are you trying to solve? My guesses:
>> 1) avoid checkpoint I/O saturation (I think this needs to be done but
>>   this can be done in current checkpoint scheme).
>> 2) reduce number of redo recovery log records (how important is this,
>>   at cost to runtime list manipulation)
>> 3) some other problem?
>>
>> I think you are trying to solve 2 - which I admit I don't see as much
>> of a problem.  Currently at a high level (ignoring details) we do:
>> 1) start checkpoint, note current end of log file (LOGEND)
>> 2) we should slowly write pages all dirty pages in cache (I agree we
>>    need a fix in this area to current scheme)
>> 3) when done write checkpoint log record indicating REDO mark at LOGEND,
>>    now log may be at LOGEND + N
>>
>> I think what you want is at step 2 to somehow write multiple checkpoint
>> log records rather than wait to the end.  Let's assume previous REDO
>> MARK was LOGEND_PREV.  So while writing the pages you would write a new
>> type of checkpoint record that would move the REDO mark up to somewhere
>> between LOGEND_PREV and current end of log file.  I think I agree that
>> if you wrote a checkpoint record for every I/O from your ordered list
>> then you would guarantee minimum redo recovery time, where the current
>> system writes one log record at end of all I/O's which at the end of
>> writing all page would match your approach (a little hard to compare
>> as your approach is continuous but if you just compare the dirty page
>> list at LOGEND I think this is true).
>>
>> So again, let's first talk about what you want to solve rather than
>> how to solve it.  Maybe you have some assumptions about what type of
>> system you are trying to address, like:  size of cache, percentage of
>> dirty pages, how long it takes to do a checkpoint vs. recovery time
>> requirements of the applications?
>>
>> Raymond Raymond wrote:
>>
>> >
>> >> From: Mike Matrigali <mikem_app@sbcglobal.net>
>> >>
>> >> I think this is the right path, though would need more details:
>> >> o does boot mean first time boot for each db?
>> >> o how to determine "this machine"
>> >> o and the total time to run such a test.
>> >>
>> >> There are some very quick and useful tests that would be fine to
>> >> add to the default system and do one time per database    Measureing
>> >> how long to do a commit and how long to do a single database read from
>> >> disk would be fine.  Seems like
>> >> just these 2 numbers could be used to come up with a very good
>> >> default estimate of log recovery time per log record.  Then as you
>> >> propose the actual estimate can be improved by meauring real
>> >> recovery time in the future.
>> >>
>> >> I am not convinced of the need for the bigger test, but if the default
>> >> is not to run it automatically and it is your "itch" to have such
>> >> a configuration option then I would not oppose.  I do see great value
>> >> in coming up with a very good default estimate of recovery time 
>> estimate
>> >> based on outstanding number of log records.  And
>> >> I even envision
>> >> a framework in the future where derby would schedule other 
>> non-essential
>> >> background tasks that have been discussed in the
>> >>
>> >> On a different track I am still unclear on the checkpoint dirty page
>> >> lru list.  Rather than talk about implementation details, I would
>> >> like to understand the problem you are trying to solve.  For instance
>> >> I well understand the goal to configure checkpoints such that they
>> >> map to user understandable concept of the tradeoff of current runtime
>> >> performance vs. how long am I willing to wait the next time I boot
>> >> the database after a crash.
>> >>
>> >> What is the other problem you are looking at.
>> >>
>> >
>> >
>> > Mike:
>> >
>> > What I am looking at next is to redesign the checkpointing process.
>> > The current checkpointing mechanism will write out all the dirty pages
>> > during the checkpoint. That causes a burst of disk I/O. Lots of 
>> problems
>> > were mentioned by some people, such the DERBY-799 reported by Oystein.
>> > I have a proposal of incremental checkpointing. I have mentioned it
>> > before,I would like to explain it in more detail.
>> >
>> > We should find some way to sort the dirty pages in ascending order of
>> > the time when they were firt updated.The incremental checkpointing
>> > process will continually write out the dirty pages from the earliest
>> > updated dirty page to the latest updated dirty page. The writing rate
>> > is related to the system situation.
>> > There are two situations in which we will update the log control file:
>> > 1)A data reads or a log writes start to have a longer response time
>> > then an acceptable value, we update the log control and sleep for a
>> > while.
>> > 2)After writing out a certain number of dirty pages
>> >
>> > The benefits of it are :
>> > 1)since we wirte out dirty pages from the earliest updated page to the
>> > latest updated page, the checkpoint instance will keep advancing.Since
>> > the incremental checkpoint is performed continuously, the checkpoint
>> > instance will be much closer to the tail of the log than the 
>> conventional
>> > checkpointing.
>> > 2)the checkpointing process can be paused if the disk I/O becomes 
>> really
>> > busy, and the finished part is an intact checkpoint instance.
>> >
>> > Do you still remember I suggested to establish a establish a dirty page
>> > list in wich dirty pages are sorted in ascending order of the time when
>> > they were firt updated? I would like to discuss on it again.
>> >
>> > Actually the list is not designed to speed up the checkpoint 
>> process. It
>> > is for the incremental checkpointing described above.To make the 
>> checkpoint
>> > instance keep advancing, We should guarantee the earlier updated 
>> pages have
>> > been written out.That's why I suggested to establish such a list.
>> >
>> > In the last disucssion, you also mentioned a problem:
>> >
>> > MM  The downside with the
>> > MM  current
>> > MM  algorithm is that a page that is made dirty after the checkpoint
>> > MM  starts
>> > MM  will be written, and if it gets written again before the next
>> > MM  checkpoint
>> > MM  we have done 1 too many I/O's.  I think I/O optimization may 
>> benefit
>> > MM  more by working on optimizing the background I/O thread than 
>> working
>> > MM  on the checkpoint.
>> >
>> >
>> > If the background I/O thread can refer to this list.I think it can help
>> > solve the problem you mentioned. I am not very familiar with the 
>> background
>> > I/O thread. If I am wrong, please point it out.
>> >
>> > In the list, the dirt pages are sorted in ascending order of the 
>> time when
>> > they were firt updated, which means the oldest dirty page is in the 
>> head of
>> > the list and the latest updated dirty page is in the end of the list.
>> > The operations on the list are :
>> > - When a page is updated and it is not in the list, we will append 
>> it to
>> > the end of the list.
>> > - When a dirty page in the list is written out to disk, it will be 
>> released
>> > from the list.
>> >
>> > Let's look into your problem:
>> >  if a page is made dirty after the checkpoint starts,
>> >
>> > 1) if the page was made dirty before this update, it was supposed to be
>> >  in the list already.We don't need to add it again.
>> >  When the checkpoint process writes this dirty page out to disk, it 
>> will
>> >  be released from the list, and if the background I/O thread refer to
>> >  the list, it will know it's no need to write this page out again.
>> > 2) if the page was first time updated. It will be appended to the end
>> >  of the list.If the background I/O thread refer to the list, it knows
>> >  it's no need to write this page out so soon since it has just been
>> >  updated.
>> >
>> >
>> > Is it resonable?
>> >
>> >
>> > Raymond
>> >
>> > _________________________________________________________________
>> > Take advantage of powerful junk e-mail filters built on patented
>> > Microsoft?SmartScreen Technology.
>> > 
>> http://join.msn.com/?pgmarket=en-ca&page=byoa/prem&xAPID=1994&DI=1034&SU=http://hotmail.com/enca&HL=Market_MSNIS_Taglines

>>
>> >  Start enjoying all the benefits of MSN?Premium right now and get the
>> > first two months FREE*.
>> >
>> >
> 
> 
> _________________________________________________________________
> Don't just Search. Find! http://search.sympatico.msn.ca/default.aspx The 
> new MSN Search! Check it out!
> 
> 


Mime
View raw message