db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: Options for syncing of log to disk
Date Wed, 30 Aug 2006 15:04:13 GMT
Initial writes to the log are to a preallocated log, but once it
is filled it is possible for there to be writes that extend the log
and thus it is not safe to not sync metadata that tracks the
length of the file.

Unfortunately this behavior is hardware, OS and JVM specific, and
the exact
meaning of rws and rwd is left vague in the javadoc that I have read.
The javadoc usually says syncs "metadata" but does not explain what
metadata.  When I worked on this issue for another db vendor, direct
OS access usually provide 3 rather than 2 options.  The 3 options
were:
1) no metadata sync
2) only sync file allocation metadata
3) sync file allocation metadata and other metadata.  The problem
    is that other metadata includes the last modified time info
    which is updated every write to the file.

What do you mean by "most OS"?
What OS/JVM are your numbers from?

When the sync option on the log was switched from using full file
sync to "rws" mode tests were run which I believe included linux 
(probably only a single version of linux - not sure which) and
XP with sun and ibm jvms (probably 1.4.2 as I think that was the latest
JVM at the time), I think apple OS was also tested but I am not sure. 
The first implementation simply switched the
to the "rws" mode but left the log file to grow as needed, "rws" mode
was picked because it is impossible to tell if file allocation metadata
is synced as part of "rwd" so in order to guarantee transaction
consistency the safest mode was picked.  Tests were run which observed
if we preallocated the log file then I/O to a preallocated file that
did not extend the file only paid 1 I/O per sync.  So work was done
to make most log I/O only happen to a preallocated file, but the logging
system was not changed to guarantee all I/O was to a preallocated file.

It is probably worth resurrecting the simple I/O test program, to let
people run on their various JVM/OS combinations.  As has been noted in
the past the results of such a test can be thrown way off by the
hardware involved.  If the hardware/filesystem has had write cache
enabled then none of these syncs can do their job and transactions are
at risk no matter what option is picked.

Also it is more common nowadays for higher end hardware to have battery 
backed cache to
optimize the sync case, which then provides instantaneous return from
the sync request but provides safe transaction as it guarantees the
write on failure (I've seen this as part of the disk and as part of
the controller).  This particular hardware feature works VERY well for
the derby log I/O case as the block being synced for the log file
metadata tends to be the same block over and over again so basically
the cache space for it is on the order of 8k.


Olav Sandstaa wrote:
> For writing the transaction log to disk Derby uses a
> RandomAccessFile. If it is supported by the JVM, the log files are
> opened in "rws" mode making the file system take care of syncing
> writes to disk. "rws" mode will ensure that both the data and the file
> meta-data is updated for every write to the file. On most operating
> system this leads to two write operation to the disk for every write
> issued by Derby. This is limiting the throughput of update intensive
> applications.
> 
> I have run some simple tests where I have changed mode from "rws" to
> "rwd" for the Derby log file. When running a small numbers of
> concurrent client threads the throughput is almost doubled and the
> response time is almost halved. I am enclosing two graphs that show
> this when running a given number of concurrent "tpc-b" clients. The
> graphs show the throughput when running with "rws" and "rwd" mode when the
> disk's write cache has been enabled and disabled.
> 
> This change should also have a positive impact on the Derby startup
> time (DERBY-1664) and derbyall. With this change the time for running
> derbyall goes down by about 10-15 minutes (approximately 10%) :-)
> 
> Is there anyone that is aware of any issues by not updating the file
> meta-data for every write? Is there any recovery scenarios where this
> can make recovery fail? Derby seems to preallocates the log file
> before starting using the file, so I think this should not influence
> the ability to fine the last data written to the file after a power
> failure.
> 
> Any comments?
> 
> Thanks,
> Olav
> 
> 
> ------------------------------------------------------------------------
> 
> 
> ------------------------------------------------------------------------
> 


Mime
View raw message