db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oystein.Grov...@Sun.COM (Øystein Grøvlen)
Subject Re: [jira] Commented: (DERBY-733) Starvation in RAFContainer.readPage()
Date Fri, 16 Dec 2005 15:23:38 GMT
>>>>> "MM" == Mike Matrigali <mikem_app@sbcglobal.net> writes:

    MM> user thread initiated read
    MM>      o should  be high priority and  should be "fair"  with other user
    MM>        initiated reads.

    MM>      o These happen anytime a read of a row causes a cache miss.
    MM>      o Currently only one I/O operation to a file can happen at a time,
    MM>        could be big problem for some types of multi-threaded,
    MM>        highly concurrent low number of table apps.  I think
    MM>        the path here should be to increase the number of
    MM>        concurrent I/O's allowed to be outstanding by allowing
    MM>        each thread to have 1 (assuming sufficient open file
    MM>        resources).  100 outstanding I/O's to a single file may
    MM>        be overkill, but in java we can't know that the file is
    MM>        not actually 100 disks underneath.  The number of I/O's
    MM>        should grow as the actual application load increases,
    MM>        note I still think max I/O's should be tied to number
    MM>        of user threads, plus maybe a small number for
    MM>        background processing.

There was an interesting paper at the last VLDB conference that
discussed the virtue of having many outstanding I/O requests:
    http://www.vldb2005.org/program/paper/wed/p1116-hall.pdf (paper)
    http://www.vldb2005.org/program/slides/wed/s1116-hall.pdf (slides)

The basic message is that many outstanding requests are good.  The
SCSI controller they used in their study was able to handle 32
concurrent requests.  One reason database systems have been
conservative with respect to outstanding requests is that they want to
control the priority of the I/O requests.  We would like user thread
initiated requests to have priority over checkpoint initiated writes.
(The authors suggest building priorities into the file system to solve
this.)

I plan to start working on a patch for allowing more concurrency
between readers within a few weeks.  The main challenge is to find the
best way to organize the open file descriptors (reuse, limit the max.
number etc.)  I will file a JIRA for this.

I also think we should consider mechanisms for read ahead.

    MM> user thread initiated write
    MM>       o same issues as user initiated read.
    MM>       o happens way less than read, as it should only happen on a cache
    MM>         miss that can't find a non-dirty page in the cache.  background
    MM>         cache cleaner  should be  keeping this from  happening, though
    MM>         apps that only do updates and cause cache hits are worst case.


    MM> checkpoint initiated write:
    MM>       o sometimes too many checkpoints happen in too short a time.
    MM>       o needs an improved scheduling algorithm, currently just defaults
    MM>         to N number of bytes to the log file no matter what the speed of
    MM>         log writes are.
    MM>       o currently may flood the I/O system causing user reads/writes to
    MM>         stall - on  some OS/JVM's this stall is  amazing like ten's of
    MM>         seconds.
    MM>       o  It is not  important that  checkpoints run  fast, it  is more
    MM>         important  that it  prodede methodically  to  conclusion while
    MM>         causing a little  interuption to "real" work by  user threads. 
    MM>         Various approaches to this were discussed, but no patches yet.

For the scheduling of checkpoints, I was hoping Raymond would come up
with something.  Raymond are you still with us?

I have discussed our I/O architecture with Solaris engineers, and our
approach of doing buffered writes followed by a fsync, I was told was
the worst approach on Solaris.  They recommended using direct I/O.  I
guess there will be situations were single-threaded direct I/O for
checkpointing will give too low throughput.  In that case, we could
consider a pool of writers.  The challenge would then be how to give
priority to user-initiated requests over multi-threaded checkpoint
writes as discussed above.

-- 
Øystein


Mime
View raw message