db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: [jira] Commented: (DERBY-733) Starvation in RAFContainer.readPage()
Date Fri, 16 Dec 2005 17:43:39 GMT
excellent, I look forward to your work on concurrent I/O.  I am likely
to not be on the list much for the next 2 weeks, so won't be able to
help much.  In thinking about this issue I was hoping that somehow
the current container cache could be enhanced to support more than
one open container per container.  Then one would automatically get
control over the open file resource across all containers, by setting
the currently supported "max" on the container pool.

The challenge is that this would be a new concept for the basic services
cache implementation.  What we want is a cache that supports multiple
objects with the same key, and that returns an available one if another
one is "busy".  Also returns a newly opened one, if all are busy.  I
am going to start a thread on this, to see if any other help is
available.  If possible I like this approach better than having a queue 
of open files per container where it hard to control the growth of one 
queue vs. the growth in another.

On the checkpoint issue, I would not have a problem with changes to the
current mechanism to do "rwd" type sync I/O rather than sync at end (but 
we will have to support both until we don't have to support older 
versions of JVM's).  I believe this is as close
to "direct i/o" as we can get from java - if you mean something 
different here let me know.  The benefit is that I believe it will fix
the checkpoint flooding the I/O system problem.  The downside is that
it will cause total number of I/O's to increase in cases where the
derby block size is smaller than the filesystem/disk blocksize -- 
assuming the OS currently converts our flood of multiple async writes to 
the same file to a smaller number of bigger I/O's.  I think this trade 
off is fine for checkpoints.  If checkpoint efficiency is an issue, 
there are a number of other ways to address it in the future.

Øystein Grøvlen wrote:
>>>>>>"MM" == Mike Matrigali <mikem_app@sbcglobal.net> writes:
>     MM> user thread initiated read
>     MM>      o should  be high priority and  should be "fair"  with other user
>     MM>        initiated reads.
>     MM>      o These happen anytime a read of a row causes a cache miss.
>     MM>      o Currently only one I/O operation to a file can happen at a time,
>     MM>        could be big problem for some types of multi-threaded,
>     MM>        highly concurrent low number of table apps.  I think
>     MM>        the path here should be to increase the number of
>     MM>        concurrent I/O's allowed to be outstanding by allowing
>     MM>        each thread to have 1 (assuming sufficient open file
>     MM>        resources).  100 outstanding I/O's to a single file may
>     MM>        be overkill, but in java we can't know that the file is
>     MM>        not actually 100 disks underneath.  The number of I/O's
>     MM>        should grow as the actual application load increases,
>     MM>        note I still think max I/O's should be tied to number
>     MM>        of user threads, plus maybe a small number for
>     MM>        background processing.
> There was an interesting paper at the last VLDB conference that
> discussed the virtue of having many outstanding I/O requests:
>     http://www.vldb2005.org/program/paper/wed/p1116-hall.pdf (paper)
>     http://www.vldb2005.org/program/slides/wed/s1116-hall.pdf (slides)
> The basic message is that many outstanding requests are good.  The
> SCSI controller they used in their study was able to handle 32
> concurrent requests.  One reason database systems have been
> conservative with respect to outstanding requests is that they want to
> control the priority of the I/O requests.  We would like user thread
> initiated requests to have priority over checkpoint initiated writes.
> (The authors suggest building priorities into the file system to solve
> this.)
> I plan to start working on a patch for allowing more concurrency
> between readers within a few weeks.  The main challenge is to find the
> best way to organize the open file descriptors (reuse, limit the max.
> number etc.)  I will file a JIRA for this.
> I also think we should consider mechanisms for read ahead.
>     MM> user thread initiated write
>     MM>       o same issues as user initiated read.
>     MM>       o happens way less than read, as it should only happen on a cache
>     MM>         miss that can't find a non-dirty page in the cache.  background
>     MM>         cache cleaner  should be  keeping this from  happening, though
>     MM>         apps that only do updates and cause cache hits are worst case.
>     MM> checkpoint initiated write:
>     MM>       o sometimes too many checkpoints happen in too short a time.
>     MM>       o needs an improved scheduling algorithm, currently just defaults
>     MM>         to N number of bytes to the log file no matter what the speed of
>     MM>         log writes are.
>     MM>       o currently may flood the I/O system causing user reads/writes to
>     MM>         stall - on  some OS/JVM's this stall is  amazing like ten's of
>     MM>         seconds.
>     MM>       o  It is not  important that  checkpoints run  fast, it  is more
>     MM>         important  that it  prodede methodically  to  conclusion while
>     MM>         causing a little  interuption to "real" work by  user threads. 
>     MM>         Various approaches to this were discussed, but no patches yet.
> For the scheduling of checkpoints, I was hoping Raymond would come up
> with something.  Raymond are you still with us?
> I have discussed our I/O architecture with Solaris engineers, and our
> approach of doing buffered writes followed by a fsync, I was told was
> the worst approach on Solaris.  They recommended using direct I/O.  I
> guess there will be situations were single-threaded direct I/O for
> checkpointing will give too low throughput.  In that case, we could
> consider a pool of writers.  The challenge would then be how to give
> priority to user-initiated requests over multi-threaded checkpoint
> writes as discussed above.

View raw message