Return-Path: X-Original-To: apmail-db-derby-dev-archive@www.apache.org Delivered-To: apmail-db-derby-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 01D8F6C7E for ; Tue, 19 Jul 2011 17:39:03 +0000 (UTC) Received: (qmail 48092 invoked by uid 500); 19 Jul 2011 17:39:02 -0000 Delivered-To: apmail-db-derby-dev-archive@db.apache.org Received: (qmail 48072 invoked by uid 500); 19 Jul 2011 17:39:02 -0000 Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: Delivered-To: mailing list derby-dev@db.apache.org Received: (qmail 48065 invoked by uid 99); 19 Jul 2011 17:39:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Jul 2011 17:39:02 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [216.32.180.14] (HELO VA3EHSOBE004.bigfish.com) (216.32.180.14) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Jul 2011 17:38:54 +0000 Received: from mail18-va3-R.bigfish.com (10.7.14.240) by VA3EHSOBE004.bigfish.com (10.7.40.24) with Microsoft SMTP Server id 14.1.225.22; Tue, 19 Jul 2011 17:38:33 +0000 Received: from mail18-va3 (localhost.localdomain [127.0.0.1]) by mail18-va3-R.bigfish.com (Postfix) with ESMTP id 9DF5AEF0275 for ; Tue, 19 Jul 2011 17:38:33 +0000 (UTC) X-SpamScore: -24 X-BigFish: VPS-24(zz9371M148cM542M1432N98dKzz1202hzz8275bh8275dhz2dh2a8h668h839h944h61h) X-Spam-TCS-SCL: 0:0 X-Forefront-Antispam-Report: CIP:74.62.37.82;KIP:(null);UIP:(null);IPVD:NLI;H:CPHUB1.canoga.com;RD:rrcs-74-62-37-82.west.biz.rr.com;EFVD:NLI Received: from mail18-va3 (localhost.localdomain [127.0.0.1]) by mail18-va3 (MessageSwitch) id 1311097059135023_24232; Tue, 19 Jul 2011 17:37:39 +0000 (UTC) Received: from VA3EHSMHS018.bigfish.com (unknown [10.7.14.239]) by mail18-va3.bigfish.com (Postfix) with ESMTP id E6D70B6004B for ; Tue, 19 Jul 2011 17:37:38 +0000 (UTC) Received: from CPHUB1.canoga.com (74.62.37.82) by VA3EHSMHS018.bigfish.com (10.7.99.28) with Microsoft SMTP Server (TLS) id 14.1.225.22; Tue, 19 Jul 2011 17:37:36 +0000 Received: from vserver1.canoga.com ([169.254.2.6]) by CPHUB1.canoga.com ([172.16.1.93]) with mapi; Tue, 19 Jul 2011 10:37:50 -0700 From: "Bergquist, Brett" To: "derby-dev@db.apache.org" Date: Tue, 19 Jul 2011 10:37:29 -0700 Subject: RE: Question on log directory of a derby database Thread-Topic: Question on log directory of a derby database Thread-Index: AcxGNvYuGC48yPTsT2mawGA0WPhUOAAA1H8w Message-ID: <97EB699F861AD841B5908C7CA9C9565601609B5EA1F8@VSERVER1.canoga.com> References: <97EB699F861AD841B5908C7CA9C9565601607A6B0DC6@VSERVER1.canoga.com> <97EB699F861AD841B5908C7CA9C9565601609B5EA1A2@VSERVER1.canoga.com> <4E25BA64.2040007@sbcglobal.net> In-Reply-To: <4E25BA64.2040007@sbcglobal.net> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US x-tm-as-product-ver: SMEX-8.0.0.1307-6.500.1024-18270.000 x-tm-as-result: No--66.408900-0.000000-31 x-tm-as-user-approved-sender: Yes x-tm-as-user-blocked-sender: No Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: canoga.com I did get a sampling of the transaction logs over the days. The oldest on= es were July 12'th and the newest ones were July 18'th. I had the sysAdmin= copy some of these aside so I can get a sample of every couple of days. Will the utility that you mention look at specific transaction log files? = I think I saw the utility mentioned in a JIRA when I was searching last nig= ht so I can probably find it myself. -----Original Message----- From: Mike Matrigali [mailto:mikem_app@sbcglobal.net]=20 Sent: Tuesday, July 19, 2011 1:10 PM To: derby-dev@db.apache.org Subject: Re: Question on log directory of a derby database It is likely the database is corrupt now that you booted without letting=20 recovery do its job. Just because there are no applications accessing the db it does not mean it is safe to delete the log files. Derby=20 maintains an in memory cache of data that along with info in the=20 recovery log files works to produce a consistent database. The only=20 time all the data is consistent is after shutting down cleanly with "shutdown=3Dtrue" giving the system a chance to write out all it's dirty cached data (basically it does a checkpoint before shutting down). You should at least run the consistency checker across all tables to=20 catch obvious corruptions, but even if this passes there may still be=20 problems with the database. Did you happen to keep a copy of the problem db with its log files=20 before corrupting it by deleting it's log files? There have been some tools posted to JIRA that=20 can allow looking at what is going on in the log files. If not did you=20 keep a listing of the files. It would be interesting to at least see the names and sizes of the files in the log directory. Do you know if the customer uses 2 phase commit at all? This is a=20 really easy way to keep a very old transaction in the log file. These=20 can stay around after clients crash or exit if not properly committed or aborted. Bergquist, Brett wrote: > Thanks for taking the time to respond Knut. It is much appreciated. >=20 > Some information: >=20 > The log files total 64Gb of disk space. So this clearly went way past th= e 10Mb of transaction log. >=20 > So the " And the checkpoint will only delete log files older than the old= est transaction that's still alive." That is important I think. So if th= ere was a stuck transaction somehow that occurred on July 12, for example, = then this could in theory keep the transaction logs around until last night= , correct? >=20 > Unfortunately the system administrator had already shutdown the database = engine before he called me. It would not boot the database in a reasonable= time. I was looking at the iostat and it looked like it was doing about 1= Mbs and I used truss to take a look at the system calls and it was processi= ng transaction log files from July 13'th after quite a while (a couple of h= ours of trying to boot the database). I did a quick calculation and it loo= ked like it would take somewhere around 17 hours to boot the database. >=20 > I looked at the last online backup that the customer had made and again, = it had many thousands of transaction logs in the backup database, so that w= as not useful either.=20 >=20 > I only had one option. I knew the system was in a quiet state as there w= as no applications accessing the database. I know this is not recommended = but I had no choice but to remove the old transaction log files and boot th= e database. It came up okay and is in operation okay so I think it will be= alright but it could possibly have corruption. I had to take the chance h= owever. >=20 > I am going to monitor the system and use the syscs_diag.transaction_table= to query the transactions if I see this happen again. Just a note however= , something similar did happen a week ago and I looks at the transactions a= nd it showed none even though there were thousands of transaction log files= around. So a question, does the online backup show up as a transaction in= the syscs_diag.transaction_table? Also, a week ago, there was no locks as= reported by the syscs_diag.lock_table (at least the snapshot of querying t= his table that I looked at). >=20 > Again if there is anything that anyone can think of that I should look at= if I see this happen again, please shout out. >=20 > Brett >=20 > -----Original Message----- > From: Knut Anders Hatlen [mailto:knut.hatlen@oracle.com]=20 > Sent: Tuesday, July 19, 2011 9:41 AM > To: derby-dev@db.apache.org > Subject: Re: Question on log directory of a derby database >=20 > "Bergquist, Brett" writes: >=20 >> I have a database in production that has been running fine for a few >> years. It started out having about 100K inserts per day into it and >> now is up to about 4.6M inserts per day and this has been working >> fine. >> >> Tonight the customer called because the system was chewing up disk >> space. I had the customer restart the database engine and it is taking >> a long time to boot the database. I had the customer check the "log" >> directory in the database and there were 62K ".dat" files present. >> >> So I am assuming that these are for transactions that have not >> committed, correct? >=20 > Yes, but they are not cleaned up until a checkpoint has run (by default > that happens when you have 10MB of transaction log), so they may contain > committed transactions too. And the checkpoint will only delete log > files older than the oldest transaction that's still alive. >=20 >> But for the life of me, I cannot figure out what >> transaction could have been in progress and not committed since July >> 12'th. It seems to me this would have exhausted memory or some other >> resource by now. >> >> One other point, an online database backup is done each night by the >> customer. Could this trigger anything like this? >=20 > Yes, an ongoing online backup will prevent deletion of log files, since > it needs them to track modifications that happen while it copies the > database. >=20 > It could also happen if log archiving has been enabled (using the > SYSCS_BACKUP_DATABASE_AND_ENABLE_LOG_ARCHIVE_MODE procedure). You can > tell whether log archiving is enabled by looking for a line that says >=20 > derby.storage.logArchiveMode=3Dtrue >=20 > in the service.properties file in the database directory. >=20 >> Tonight when running >> a utility against the database, the utility failed to acquire locks, >> but there should have been nothing else running but this utility and >> it is single threaded, so there should have been no lock contention. >> It also acts like there is a database backup that is still on going... >=20 > I don't think an online backup needs many locks. If you connect to the > database using ij and execute SELECT * FROM SYSCS_DIAG.LOCK_TABLE you > should see which locks are held, which might give some clues. >=20 >> Right now, I am just waiting for the database to cleanup and boot so >> that I can get in and examine it. Is there any shortcut or express way >> to to boot the database? Is there any way to monitor the progress of >> this boot cleanup? >=20 > I don't know of a way to speed it up. There is a flag that makes debug > builds print more info to derby.log during the recovery phase > (-Dderby.debug.true=3DLogTrace, I think), but it may be too low-level to > get much useful info in this case. >=20 >> Any thoughts or pointers in trying to figure out what is going on will >> be greatly appreciated. >> >> The database in question is Derby 10.5.1 >> >> Brett >=20