db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oystein.Grov...@Sun.COM (Øystein Grøvlen)
Subject Re: A proposal on "how to determine if derby is busy"
Date Tue, 25 Oct 2005 12:09:22 GMT
>>>>> "RR" == Raymond Raymond <raymond_derby@hotmail.com> writes:

    RR> I have a proposal:
    RR> Since the disk IO is the bottleneck of the system performance,
    RR> we should consider it more than others, such as cpu usage.
    RR> Especially in the checkpointing issue, most of the workload of
    RR> checkpointing is to write out data to disk.So, instead of determining
    RR> if derby is busy, we can try to determine if derby IO is busy.
    RR> Is it possible to determine if derby IO is busy by figuring out how
    RR> many pages(cache pages) derby writes out to disk in every time
    RR> unit or how much time it takes derby to write out one page to
    RR> disk? I found derby writes out data to disk by pages. If we can
    RR> know how many pages derby writes out in every time unit, we
    RR> can determin if derby IO is busy in some extent. e.g. if derby
    RR> writes out lots of pages every second, we can say derby IO is busy.
    RR> If derby writes out a few pages every second, we can say derby IO
    RR> is not busy. In another way, we can dertermine if derby IO is busy
    RR> by figuring out how much time it takes derby to write one cache page.
    RR> If the time is long, it means the disk IO is busy. If the time is short,
    RR> it means the disk IO is not busy.

    RR> Does that sound resonable?Welcome everyone give your comment.

Note that write performance is not a good metric when the log is
placed on another disk than the data.  In many (most?) of those
systems, the checkpointing thread will be the only thread that writes
pages to the data disk.  In addition, the checkpointer does not sync
its pages to disk. It will only be written to the file system buffer.
Reads, on the other hand, will need to access the disk (unless the
page is cached in the file system after a previous write).  Hence,
read activity is a much better measure of I/O activity.  Reads also
have are direct impact on transaction response times while writes only
affect transaction response times if the page cache has no clean

Taking a step back, the following I/O activity needs to be prioritized
for optimal performance:
        - Log writes
        - Data reads
The following has lower priority:
        - Data writes
        - Log reads (for rollback)

Separating log and data is good for at least two reasons:
        - Improves performance of log writes since data reads will not
          ruin the locality of writes. (Log is written sequentially).
        - Separating the two high priority operations makes it
          possible to optimize one type of high priority I/O
          operations without impacting the other one.

Derby needs to work well both when log and data is on the same disk
and when they are on separate disks.  However, it might be acceptable
to say that you will only get the full value of checkpoint
optimizations if you separate log and data.

In Derby there are three ways data pages may be written to disk:
        - During checkpoint
        - By the background cleaner when almost all pages are dirty
        - By a user thread if no clean page can be found.

The latter option is more expensive since pages are written to disk
one at a time.  The background cleaner should make sure this will not
happen.  The background cleaner and the checkpointer will write
several pages at a time.  This makes it possible for the file system
and the disk to optimize I/O by reordering the writes.

The objectives of the background cleaner and the checkpointer is
similar, but not quite the same.  While the background cleaner wants
to write pages so that they can be evicted from the cache, the
checkpointer needs to guarantee that all pages that was dirty at the
checkpoint time is written to disk before the log control file is
updated.  The background cleaner will not write recently used pages
since they are not candidates for eviction.

The demand for write rates are also different for the two:
        - Background cleaner:  Needs to keep up with the eviction
          rates of the system.
        - Checkpointing: Needs to satisfy requirements for recovery

Which operation needs the highest write rate, will depend a lot on the
load and the size of the page cache.

What often happens in Derby today is that the write of data pages are
bursty.  Between checkpoints hardly any pages are written to disk.
Ideally, the write rate should be kept pretty stable over the whole
period between two checkpoints.  That is, when a checkpoint is
finished a new one is immediately started.  This means that given a
fixed load, increasing the checkpoint interval should decrease the
write rate.

Today, the checkpoint interval is determined by the amount of
generated log (default 10 MB).  I have no idea how long the recovery
time will usually be in that case.  Without knowing that, it is
difficult to know how much it is acceptable to increase the interval.
A study of that would be nice.  It would also be much easier for a
user to configure an approximate recovery time than some magic number
of megabytes of log.

I would like to suggest the following:
        - The user may be able to configure a certain recovery time
          that Derby should try to satisfy.  (An appropriate default
          must be determined).
        - During initilization of Derby, we run some measurement that
          determines the performance of the system and maps the
          recovery time into some X megabytes of log.)
        - A checkpoint is made by default every X megabytes of log.
        - One tries to dynamically adjust the write rate of the
          checkpoint so that the writing takes an entire checkpoint
          interval.  (E.g., write Y pages, then pause for some time).
        - If data reads or a log writes (if log in default location)
          start to have long response times, one can increase the
          checkpoint interval.  The user should be able to turn this
          feature off in case longer recovery times are no acceptable.

Hope this rambling has some value,


View raw message