db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raymond Raymond" <raymond_de...@hotmail.com>
Subject Re: Discussion of incremental checkpointing
Date Wed, 08 Feb 2006 05:01:30 GMT
Mike,
  I am sorry, I did not make it very clear what I am going to do.
I will explain it now. I think the last thing you mentioned is what
I want to solve :

MM  how long it takes to do a checkpoint vs. recovery time
MM  requirements of the applications?

  I am working on the issue of automatic checkpointing which makes
the derby engine control the checkpointing process by itself depends
on the runtime situation.  The goal is to make a good balance between
runtime resource consumption of the checkpointing process (especially
disk I/O resource), and the recovery time. I want to do checkpointing as
much as possible while have less interference with the real work of derby.
(Most of the system resource I mentioned here is the disk I/O resource,
since as for the checkpointing issue, disk I/O resource is the bottleneck
over all the system resources)
   LetĄ¯s look into the current derby checkpointing mechanism first. If we
set the checkpointing interval too short, derby will do checkpointing very
often, which will take lots of disk I/O resource, and the reponses to the
requests from clients will be delayed. Conversely, if we set the 
checkpointing
interval too long, derby will keep lots of data in cache and when crush
happens, the recovery time will be very long. I am trying to make a good
balance between them.
   Then, let me show the benefits of my proposal. My basic idea is to do
checkpoint as much as possible when disk I/O is not busy. I think the
incremental checkpointing mechanism combined with consideration of the
runtime situation( what we discussed last time ¨C "Discussion of how to map
the recovery time into Xmb of log", about statistic of system performance
information, time of a data reads or time of a log writes, etc.) can solve 
the
problem. We can imagine that the incremental checkpointing mechanism
divides the current checkpoint process into several pieces.  When the system
is not busy, we do a piece of checkpoint; when the system becomes busy,
we suspend the checkpoint process for a while, and so on. We make each
piece of checkpoint an intact checkpointing process by updating the log
control file. I think what you described in your comments is what the 
incremental
checkpointing will do:

MM  I think what you want is at step 2 to somehow write multiple
MM  checkpoint log records rather than wait to the end.  Let's assume 
previous REDO
MM  MARK was LOGEND_PREV.  So while writing the pages you would write a new
MM  type of checkpoint record that would move the REDO mark up to somewhere
MM  between LOGEND_PREV and current end of log file.  I think I agree that
MM  if you wrote a checkpoint record for every I/O from your ordered list
MM  then you would guarantee minimum redo recovery time, where the current
MM  system writes one log record at end of all I/O's which at the end of
MM  writing all page would match your approach (a little hard to compare
MM  as your approach is continuous but if you just compare the dirty page
MM  list at LOGEND I think this is true).

To establish a dirty page list is just my suggestion. The purpose of that is 
to
sort the dirty pages in ascending order of the time when they were first
updated. If we can have any other ideas to do that without using extra
memory,we donĄ¯t have to establish such a list.

Anyone has any suggestion on it? Everyone is welcome to give your opinion
on it.



Raymond



>From: Mike Matrigali <mikem_app@sbcglobal.net>
>Reply-To: <derby-dev@db.apache.org>
>To: derby-dev@db.apache.org
>Subject: Re: Discussion of incremental checkpointing
>Date: Tue, 07 Feb 2006 13:00:23 -0800
>
>I am hoping maybe someone else will get in this discussion, I
>am not seeing the benefit vs runtime cost of the incremental checkpoint
>approach.
>It could be I am blinded by the systems that I have worked on.  I
>just have this gut reaction to adding another queue to the I/O
>system, going foward I would rather see Derby more parallel and
>a single queue seems to be the wrong direction.
>
>Again what problem are you trying to solve? My guesses:
>1) avoid checkpoint I/O saturation (I think this needs to be done but
>   this can be done in current checkpoint scheme).
>2) reduce number of redo recovery log records (how important is this,
>   at cost to runtime list manipulation)
>3) some other problem?
>
>I think you are trying to solve 2 - which I admit I don't see as much
>of a problem.  Currently at a high level (ignoring details) we do:
>1) start checkpoint, note current end of log file (LOGEND)
>2) we should slowly write pages all dirty pages in cache (I agree we
>    need a fix in this area to current scheme)
>3) when done write checkpoint log record indicating REDO mark at LOGEND,
>    now log may be at LOGEND + N
>
>I think what you want is at step 2 to somehow write multiple checkpoint
>log records rather than wait to the end.  Let's assume previous REDO
>MARK was LOGEND_PREV.  So while writing the pages you would write a new
>type of checkpoint record that would move the REDO mark up to somewhere
>between LOGEND_PREV and current end of log file.  I think I agree that
>if you wrote a checkpoint record for every I/O from your ordered list
>then you would guarantee minimum redo recovery time, where the current
>system writes one log record at end of all I/O's which at the end of
>writing all page would match your approach (a little hard to compare
>as your approach is continuous but if you just compare the dirty page
>list at LOGEND I think this is true).
>
>So again, let's first talk about what you want to solve rather than
>how to solve it.  Maybe you have some assumptions about what type of
>system you are trying to address, like:  size of cache, percentage of
>dirty pages, how long it takes to do a checkpoint vs. recovery time
>requirements of the applications?
>
>Raymond Raymond wrote:
>
> >
> >> From: Mike Matrigali <mikem_app@sbcglobal.net>
> >>
> >> I think this is the right path, though would need more details:
> >> o does boot mean first time boot for each db?
> >> o how to determine "this machine"
> >> o and the total time to run such a test.
> >>
> >> There are some very quick and useful tests that would be fine to
> >> add to the default system and do one time per database    Measureing
> >> how long to do a commit and how long to do a single database read from
> >> disk would be fine.  Seems like
> >> just these 2 numbers could be used to come up with a very good
> >> default estimate of log recovery time per log record.  Then as you
> >> propose the actual estimate can be improved by meauring real
> >> recovery time in the future.
> >>
> >> I am not convinced of the need for the bigger test, but if the default
> >> is not to run it automatically and it is your "itch" to have such
> >> a configuration option then I would not oppose.  I do see great value
> >> in coming up with a very good default estimate of recovery time 
>estimate
> >> based on outstanding number of log records.  And
> >> I even envision
> >> a framework in the future where derby would schedule other 
>non-essential
> >> background tasks that have been discussed in the
> >>
> >> On a different track I am still unclear on the checkpoint dirty page
> >> lru list.  Rather than talk about implementation details, I would
> >> like to understand the problem you are trying to solve.  For instance
> >> I well understand the goal to configure checkpoints such that they
> >> map to user understandable concept of the tradeoff of current runtime
> >> performance vs. how long am I willing to wait the next time I boot
> >> the database after a crash.
> >>
> >> What is the other problem you are looking at.
> >>
> >
> >
> > Mike:
> >
> > What I am looking at next is to redesign the checkpointing process.
> > The current checkpointing mechanism will write out all the dirty pages
> > during the checkpoint. That causes a burst of disk I/O. Lots of problems
> > were mentioned by some people, such the DERBY-799 reported by Oystein.
> > I have a proposal of incremental checkpointing. I have mentioned it
> > before,I would like to explain it in more detail.
> >
> > We should find some way to sort the dirty pages in ascending order of
> > the time when they were firt updated.The incremental checkpointing
> > process will continually write out the dirty pages from the earliest
> > updated dirty page to the latest updated dirty page. The writing rate
> > is related to the system situation.
> > There are two situations in which we will update the log control file:
> > 1)A data reads or a log writes start to have a longer response time
> > then an acceptable value, we update the log control and sleep for a
> > while.
> > 2)After writing out a certain number of dirty pages
> >
> > The benefits of it are :
> > 1)since we wirte out dirty pages from the earliest updated page to the
> > latest updated page, the checkpoint instance will keep advancing.Since
> > the incremental checkpoint is performed continuously, the checkpoint
> > instance will be much closer to the tail of the log than the 
>conventional
> > checkpointing.
> > 2)the checkpointing process can be paused if the disk I/O becomes really
> > busy, and the finished part is an intact checkpoint instance.
> >
> > Do you still remember I suggested to establish a establish a dirty page
> > list in wich dirty pages are sorted in ascending order of the time when
> > they were firt updated? I would like to discuss on it again.
> >
> > Actually the list is not designed to speed up the checkpoint process. It
> > is for the incremental checkpointing described above.To make the 
>checkpoint
> > instance keep advancing, We should guarantee the earlier updated pages 
>have
> > been written out.That's why I suggested to establish such a list.
> >
> > In the last disucssion, you also mentioned a problem:
> >
> > MM  The downside with the
> > MM  current
> > MM  algorithm is that a page that is made dirty after the checkpoint
> > MM  starts
> > MM  will be written, and if it gets written again before the next
> > MM  checkpoint
> > MM  we have done 1 too many I/O's.  I think I/O optimization may benefit
> > MM  more by working on optimizing the background I/O thread than working
> > MM  on the checkpoint.
> >
> >
> > If the background I/O thread can refer to this list.I think it can help
> > solve the problem you mentioned. I am not very familiar with the 
>background
> > I/O thread. If I am wrong, please point it out.
> >
> > In the list, the dirt pages are sorted in ascending order of the time 
>when
> > they were firt updated, which means the oldest dirty page is in the head 
>of
> > the list and the latest updated dirty page is in the end of the list.
> > The operations on the list are :
> > - When a page is updated and it is not in the list, we will append it to
> > the end of the list.
> > - When a dirty page in the list is written out to disk, it will be 
>released
> > from the list.
> >
> > Let's look into your problem:
> >  if a page is made dirty after the checkpoint starts,
> >
> > 1) if the page was made dirty before this update, it was supposed to be
> >  in the list already.We don't need to add it again.
> >  When the checkpoint process writes this dirty page out to disk, it will
> >  be released from the list, and if the background I/O thread refer to
> >  the list, it will know it's no need to write this page out again.
> > 2) if the page was first time updated. It will be appended to the end
> >  of the list.If the background I/O thread refer to the list, it knows
> >  it's no need to write this page out so soon since it has just been
> >  updated.
> >
> >
> > Is it resonable?
> >
> >
> > Raymond
> >
> > _________________________________________________________________
> > Take advantage of powerful junk e-mail filters built on patented
> > Microsoft?SmartScreen Technology.
> > 
>http://join.msn.com/?pgmarket=en-ca&page=byoa/prem&xAPID=1994&DI=1034&SU=http://hotmail.com/enca&HL=Market_MSNIS_Taglines
> >  Start enjoying all the benefits of MSN?Premium right now and get the
> > first two months FREE*.
> >
> >

_________________________________________________________________
Powerful Parental Controls Let your child discover the best the Internet has 
to offer.  
http://join.msn.com/?pgmarket=en-ca&page=byoa/prem&xAPID=1994&DI=1034&SU=http://hotmail.com/enca&HL=Market_MSNIS_Taglines

  Start enjoying all the benefits of MSNŽ Premium right now and get the 
first two months FREE*.


Mime
View raw message