db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel John Debrunner <...@debrunners.com>
Subject Re: Derby I/O issues during checkpointing
Date Tue, 01 Nov 2005 17:19:26 GMT
Øystein Grøvlen wrote:

> Some tests runs we have done show very long transaction response times
> during checkpointing.  This has been seen on several platforms.  The
> load is TPC-B like transactions and the write cache is turned off so
> the system is I/O bound.  There seems to be two major issues:

Nice investigation, I think I have seen similar problms on Windows.

> 1. Derby does checkpointing by writing all dirty pages by
>    RandomAccessFile.write() and then do file sync when the entire
>    cache has been scanned.  When the page cache is large, the file
>    system buffer will overflow during checkpointing, and occasionally
>    the writes will take very long.  I have observed single write
>    operations that took almost 12 seconds.  What is even worse is that
>    during this period also read performance on other files can be very
>    bad.  For example, reading an index page from disk can take close
>    to 10 seconds when the base table is checkpointed.  Hence,
>    transactions are severely slowed down.
>    I have managed to improve response times by flushing every file for
>    every 100th write.  Is this something we should consider including
>    in the code?  Do you have better suggestions?

Sounds reasonable.

> 2. What makes thing even worse is that only a single thread can read a
>    page from a file at a time.  (Note that Derby has one file per
>    table). This is because the implementation of RAFContainer.readPage
>    is as follow:
>         synchronized (this) {  // 'this' is a FileContainer, i.e. a file object
>             fileData.seek(pageOffset);  // fileData is a RandomAccessFile
>             fileData.readFully(pageData, 0, pageSize);
> 	}
>    During checkpoint when I/O is slow this creates long queques of
>    readers.  In my run with 20 clients, I observed read requests that
>    took more than 20 seconds.

Hmmm, I think that code was written assuming the call would nat take
that long!

>    This behavior will also limit throughput and can partly explains
>    why I get low CPU utilization with 20 clients.  All my TPCB-B
>    clients are serialized since most will need 1-2 disk accesses
>    (index leaf page and one page of the account table).
>    Generally, in order to make the OS able to optimize I/O, one should
>    have many outstanding I/O calls at a time.  (See Frederiksen,
>    Bonnet: "Getting Priorities Straight: Improving Linux Support for
>    Database I/O", VLDB 2005).  
>    I have attached a patch where I have introduced several file
>    descriptors (RandomAccessFile objects) per RAFContainer.  These are
>    used for reading.  The principle is that when all readers are busy,
>    a readPage request will create a new reader.  (There is a maximum
>    number of readers.)  With this patch, throughput was improved by
>    50% on linux.  The combination of this patch and the synching for
>    every 100th write, reduced maximum transaction response times with
>    90%.

Only concern would be number of open file descriptors as others have
pointed out. Might want to scavenged open descriptors from containers
that are no longer heavily used.

>    The patch is not ready for inclusion into Derby, but I would like
>    to here whether you think this is a viable approach.

It seems like these changes are low risk and enable worthwhile
performance increases without completely changing the i/o system.
Such changes could then provide the performance that a full async
re-write would have to better (or at least match).


View raw message