From derby-dev-return-256-apmail-db-derby-dev-archive=db.apache.org@db.apache.org Tue Sep 07 05:30:50 2004 Return-Path: Delivered-To: apmail-db-derby-dev-archive@www.apache.org Received: (qmail 94033 invoked from network); 7 Sep 2004 05:30:50 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 7 Sep 2004 05:30:50 -0000 Received: (qmail 5275 invoked by uid 500); 7 Sep 2004 05:30:35 -0000 Delivered-To: apmail-db-derby-dev-archive@db.apache.org Received: (qmail 5109 invoked by uid 500); 7 Sep 2004 05:30:32 -0000 Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: list-post: List-Id: "Derby Development" Reply-To: "Derby Development" Delivered-To: mailing list derby-dev@db.apache.org Received: (qmail 5048 invoked by uid 99); 7 Sep 2004 05:30:28 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=RCVD_BY_IP X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from [216.39.128.16] (HELO smtp1.sea.theriver.com) (216.39.128.16) by apache.org (qpsmtpd/0.28) with SMTP; Mon, 06 Sep 2004 22:30:26 -0700 Received: (qmail 19662 invoked from network); 7 Sep 2004 05:30:01 -0000 Received: from c-67-170-1-160.client.comcast.net (HELO [192.168.1.101]) (occam@67.170.1.160) by 199.201.191.1 with SMTP; Tue, 07 Sep 2004 05:30:01 +0000 Mime-Version: 1.0 (Apple Message framework v619) In-Reply-To: <6C91DE2C-FE40-11D8-A554-000D93ADDBD4@serv.net> References: <6A8D4356-FBDD-11D8-A91C-000D93ADDBD4@serv.net> <509F159B-FC87-11D8-AF75-000D93ADDBD4@serv.net> <5BA2C646-FE12-11D8-8833-000D93ADDBD4@serv.net> <6C91DE2C-FE40-11D8-A554-000D93ADDBD4@serv.net> Content-Type: text/plain; charset=WINDOWS-1252; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: quoted-printable From: Joseph Grace Subject: More on open("rws"/"rwd"), O_{D,}SYNC, metadata, and OSX JVM Date: Mon, 6 Sep 2004 22:30:14 -0700 To: derby-dev@db.apache.org X-Mailer: Apple Mail (2.619) X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Dear derby-dev: Given the derby issue on OSX of preallocate+"rws" file open()'s =20 failing, I did some more research (google) on the issue of "rws" =20 metadata. I don't think it's quite as mysterious as it may have at =20 first seemed (though there are still open questions of O_SYNC =20 interpretation on OSX and other advanced operating systems). In any =20 case, take the following with a grain of salt and chime in if you can =20= with additional corroboration or information. I gather that metadata typically refers to the update of the directory =20= information associated with the contents of the file. So, if a =20 database gets updated, then the, e.g., "modified time", should also be =20= updated (in the directory entry for that file) as well. If that's =20 true, then: "rwd" updates just the contents ensuring full data retrieval, but =20= glosses over all but essential metadata (i.e., new block allocation =20 metadata is handled, but directory timestamp updates are skipped). "rws" updates not only the contents, but also non-essential =20 metadata (directory timestamps (e.g., "modified time", "access time") =20= et al.). I'm not sure it's that simple (since the Java most likely relies on the =20= underlying OS support for O_SYNC, O_DSYNC or their analogues) but at =20 least it narrows the focus a bit. I get this impression from a variety =20= of sites, but this URL is perhaps the clearest I found: =20 http://publib16.boulder.ibm.com/doc_link/en_US/a_doc_lib/aixprggd/=20 genprogc/fileio.htm#wq222 under "Synchronous I/O" where it mentions: -=3D- =95 Specified by the O_DSYNC open flag. When a file is = opened using the =20 O_DSYNC open mode, the write () system call will not return until the =20= file data and all file system meta-data required to retrieve the file =20= data are both written to their permanent storage locations. =95 Specified by the O_SYNC open flag. In addition to items = specified =20 by O_DSYNC, O_SYNC specifies that the write () system call will not =20 return until all file attributes relative to the I/O are written to =20 their permanent storage locations, even if the attributes are not =20 required to retrieve the file data. -=3D- IOW, I believe O_DSYNC should protect data integrity even if it =20 (purposely for performance reasons) avoids updating all associated =20 metadata. O_SYNC is good too, but (at least according to above) you =20 pay a performance penalty. So, I think O_DSYNC may be a worthwhile =20 substitute for O_SYNC as long as the incidental metadata is not all =20 important. Bottom line: a production d/b with performance goals likely uses =20 O_DSYNC (since O_SYNC is overkill if you just need to protect the =20 data). -=3D- I also looked a bit into OSX for O_DSYNC in search of "rws"/"rwd" =20 insights. I downloaded the sources for Darwin (OSX's BSD =20 underpinnings). It appears that OSX only has an O_SYNC flag. The =20 Darwin code says that O_DSYNC is not supported yet. So, in theory, =20 O_DSYNC should degenerate to O_SYNC. Unfortunately, even though I was able to look at the Darwin sources, I =20= do not have the Apple Java sources to see how the flags are treated in =20= OSX's JVM1.4.2. (There is no JVM1.5 (Java Tiger) (pre)release yet, so =20= I can't test against a newer version of Java (yet).) Having said all that, the question still remains why does O_SYNC behave =20= differently than O_DSYNC in the OSX JVM (especially since only O_SYNC =20= exists in Darwin). I don't know. The two knee-jerk hypotheses I have =20= are: 1. jvm:O_SYNC is using darwin:O_SYNC, but jvm:O_DSYNC is =20 darwin:no_sync (that would be bad). So, if you need O_DSYNC, you =20 better use O_SYNC (which fails mysteriously when file is preallocated). _or_ 2. jvm:O_DSYNC uses darwin:O_SYNC (as it should), and jvm:O_SYNC uses =20= darwin:O_SYNC and also synchronizes OSX metadata files like .DS_Store =20= and resource forks (and ends up taking exception under ambiguous =20 conditions in "rws" mode). I don't know enough about typical O_SYNC, .DS_Store, or resource forks =20= to know the answer to this mystery. Bottom line: I know neither whether jvm:O_DSYNC protects data on =20 OSX/Java1.4.2 (as it should), nor why jvm:O_SYNC is any different than =20= jvm:O_DSYNC on OSX/Java1.4.2 (especially when jvm:O_DSYNC should =20 degenerate to darwin:O_SYNC since Darwin only has O_SYNC). Anyway, that's probably an excess for one post. If anyone has insight =20= to any of these questions (e.g., anyone from the OSX java team! ;-), =20 please share. Cheers, =3D Joe =3D