db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: Derby I/O issues during checkpointing
Date Wed, 02 Nov 2005 16:16:17 GMT

Øystein Grøvlen wrote:
> (Any reason this did not go to derby-dev?)
>>>>>>"MM" == Mike Matrigali <mikem_app@sbcglobal.net> writes:
>     MM> Your change to checkpoint seems like a low risk, and from your tests
>     MM> high benefit.  My only worry is those systems with a bad implementation
>     MM> of "sync" which is linearly related to size of file or size of OS disk
>     MM> cache (basically I have seen implementations where the OS does not have
>     MM> a data structure to track dirty pages associated with a file so it has
>     MM> two choices: 1) search every page in the disk cache or probe in the disk
>     MM> cache for every page in the file - it chooses which approach to use
>     MM> based on file size vs cache size).  I was willing to pay the cost of one
>     MM> of these calls per big file, but I think would lean toward just using
>     MM> sync write for checkpoint given the problems you are seeing, but not
>     MM> very strongly.  With reasonable
>     MM> implementations of file sync I like your approach.
>     MM> If you go with syncing every 100, I wonder if it might make sense to
>     MM> "slow" checkpoint even more in a busy system.  Since the writes are
>     MM> not really doing I/O maybe it might make sense to give other threads
>     MM> in the system a chance more often at an I/O slot by throwing in a
>     MM> give up my time slice call every N writes with N being a relatively
>     MM> small number like 1-5.
> Maybe I should try to see what happens if I just makes the checkpoint
> sleep for a few seconds every N writes instead of doing a sync.  It
> could be that the positive effect is mainly from slowing down the
> checkpoint when the I/O system is overloaded.  

Yes, that would be interesting.  If that helps then I think there are 
better things than sleep, but not worth coding if sleep doesn't help.

Do you think your system will see the same issues if running in 
durability=test mode (ie. no syncs).  Someday I would like to produce
a non-sync system which would guarantee consistent db recovery (just 
might lose transactions but not half of a transaction), so it would
be interesting to understand if the problem is the sync or the problem
is just the blast of unsynced writes.

> ...
>     MM> What is your log rate (bytes/sec to the log).  I think you are just
>     MM> saying that the default of a checkpoint per 10 meg of log is a bad
>     MM> default for these kinds of apps.
> The first checkpoint occurs after about 5 minutes. I guess that should
> indicate a log rate of 300 kbytes/sec.
> I do not think changing the checkpoint interval would help much on the
> high response times unless you make it very short so that the number
> of pages per checkpoint is much smaller. 
ok, that is not too bad - I was worried that you were generating a 
checkpoint every few seconds.  Though in such an application I might
set the checkpoint rate to be more like once per hour.  Again this is
a separate issue, no matter what the checkpoint rate it should still be
fixed to avoid the hits you are seeing.

View raw message