Return-Path: X-Original-To: apmail-db-derby-dev-archive@www.apache.org Delivered-To: apmail-db-derby-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D4AF16C28 for ; Tue, 19 Jul 2011 17:35:35 +0000 (UTC) Received: (qmail 46200 invoked by uid 500); 19 Jul 2011 17:35:35 -0000 Delivered-To: apmail-db-derby-dev-archive@db.apache.org Received: (qmail 46176 invoked by uid 500); 19 Jul 2011 17:35:34 -0000 Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: Delivered-To: mailing list derby-dev@db.apache.org Received: (qmail 46168 invoked by uid 99); 19 Jul 2011 17:35:34 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Jul 2011 17:35:34 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [216.32.180.31] (HELO VA3EHSOBE005.bigfish.com) (216.32.180.31) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Jul 2011 17:35:26 +0000 Received: from mail63-va3-R.bigfish.com (10.7.14.245) by VA3EHSOBE005.bigfish.com (10.7.40.25) with Microsoft SMTP Server id 14.1.225.22; Tue, 19 Jul 2011 17:35:04 +0000 Received: from mail63-va3 (localhost.localdomain [127.0.0.1]) by mail63-va3-R.bigfish.com (Postfix) with ESMTP id 4D66A1968158 for ; Tue, 19 Jul 2011 17:35:04 +0000 (UTC) X-SpamScore: -27 X-BigFish: VPS-27(zz9371M18a9K148cM542M1432N98dKzz1202hzz8275bh8275dhz2dh2a8h668h839h944h61h) X-Spam-TCS-SCL: 0:0 X-Forefront-Antispam-Report: CIP:74.62.37.82;KIP:(null);UIP:(null);IPVD:NLI;H:CPHUB1.canoga.com;RD:rrcs-74-62-37-82.west.biz.rr.com;EFVD:NLI Received: from mail63-va3 (localhost.localdomain [127.0.0.1]) by mail63-va3 (MessageSwitch) id 1311096876990468_25166; Tue, 19 Jul 2011 17:34:36 +0000 (UTC) Received: from VA3EHSMHS002.bigfish.com (unknown [10.7.14.239]) by mail63-va3.bigfish.com (Postfix) with ESMTP id C18FB118067 for ; Tue, 19 Jul 2011 17:34:36 +0000 (UTC) Received: from CPHUB1.canoga.com (74.62.37.82) by VA3EHSMHS002.bigfish.com (10.7.99.12) with Microsoft SMTP Server (TLS) id 14.1.225.22; Tue, 19 Jul 2011 17:34:32 +0000 Received: from CPHUB2.canoga.com (172.16.1.94) by CPHUB1.canoga.com (172.16.1.93) with Microsoft SMTP Server (TLS) id 8.2.213.0; Tue, 19 Jul 2011 10:34:45 -0700 Received: from vserver1.canoga.com ([169.254.2.6]) by CPHUB2.canoga.com ([172.16.1.94]) with mapi; Tue, 19 Jul 2011 10:34:45 -0700 From: "Bergquist, Brett" To: "derby-dev@db.apache.org" Date: Tue, 19 Jul 2011 10:34:25 -0700 Subject: RE: Question on log directory of a derby database Thread-Topic: Question on log directory of a derby database Thread-Index: AcxGN6E5RNusL3BvS9WRVWjKiLkHbAAAe5pg Message-ID: <97EB699F861AD841B5908C7CA9C9565601609B5EA1F6@VSERVER1.canoga.com> References: <97EB699F861AD841B5908C7CA9C9565601607A6B0DC6@VSERVER1.canoga.com> <97EB699F861AD841B5908C7CA9C9565601609B5EA1A2@VSERVER1.canoga.com> <4E25BBB9.8010207@sbcglobal.net> In-Reply-To: <4E25BBB9.8010207@sbcglobal.net> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US X-TM-AS-Product-Ver: SMEX-8.0.0.1307-6.500.1024-18270.000 X-TM-AS-Result: No--34.447700-0.000000-31 X-TM-AS-User-Approved-Sender: Yes X-TM-AS-User-Blocked-Sender: No Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: canoga.com X-Virus-Checked: Checked by ClamAV on apache.org I will try to get a copy of the backup database. It will probably take a w= hile. I have the system administrator monitoring the transaction log direc= tory and will call me if it starts to grow again. At that point I will use= IJ and try to look at the transactions table and locks table. On July 6'th a similar thing started happening and examining the transactio= ns table showed no transactions. I am going to work on this see what I can= find out. Again, I appreciate all of the feedback and suggestions and help.=20 Brett -----Original Message----- From: Mike Matrigali [mailto:mikem_app@sbcglobal.net]=20 Sent: Tuesday, July 19, 2011 1:16 PM To: derby-dev@db.apache.org Subject: Re: Question on log directory of a derby database I would suggest you boot the backup database somewhere where you can let it run and let it finish, and then shut it down cleanly with=20 shutdown=3Dtrue. And then boot it again and see if that fixes the problem. If you have the ability to run with a debug server we can give you debug options that will print some interesting stuff to the log that may give clues about the oldest transaction and checkpoint and maybe tell why so much log is being kept. Bergquist, Brett wrote: > Thanks for taking the time to respond Knut. It is much appreciated. >=20 > Some information: >=20 > The log files total 64Gb of disk space. So this clearly went way past th= e 10Mb of transaction log. >=20 > So the " And the checkpoint will only delete log files older than the old= est transaction that's still alive." That is important I think. So if th= ere was a stuck transaction somehow that occurred on July 12, for example, = then this could in theory keep the transaction logs around until last night= , correct? >=20 > Unfortunately the system administrator had already shutdown the database = engine before he called me. It would not boot the database in a reasonable= time. I was looking at the iostat and it looked like it was doing about 1= Mbs and I used truss to take a look at the system calls and it was processi= ng transaction log files from July 13'th after quite a while (a couple of h= ours of trying to boot the database). I did a quick calculation and it loo= ked like it would take somewhere around 17 hours to boot the database. >=20 > I looked at the last online backup that the customer had made and again, = it had many thousands of transaction logs in the backup database, so that w= as not useful either.=20 >=20 > I only had one option. I knew the system was in a quiet state as there w= as no applications accessing the database. I know this is not recommended = but I had no choice but to remove the old transaction log files and boot th= e database. It came up okay and is in operation okay so I think it will be= alright but it could possibly have corruption. I had to take the chance h= owever. >=20 > I am going to monitor the system and use the syscs_diag.transaction_table= to query the transactions if I see this happen again. Just a note however= , something similar did happen a week ago and I looks at the transactions a= nd it showed none even though there were thousands of transaction log files= around. So a question, does the online backup show up as a transaction in= the syscs_diag.transaction_table? Also, a week ago, there was no locks as= reported by the syscs_diag.lock_table (at least the snapshot of querying t= his table that I looked at). >=20 > Again if there is anything that anyone can think of that I should look at= if I see this happen again, please shout out. >=20 > Brett >=20 > -----Original Message----- > From: Knut Anders Hatlen [mailto:knut.hatlen@oracle.com]=20 > Sent: Tuesday, July 19, 2011 9:41 AM > To: derby-dev@db.apache.org > Subject: Re: Question on log directory of a derby database >=20 > "Bergquist, Brett" writes: >=20 >> I have a database in production that has been running fine for a few >> years. It started out having about 100K inserts per day into it and >> now is up to about 4.6M inserts per day and this has been working >> fine. >> >> Tonight the customer called because the system was chewing up disk >> space. I had the customer restart the database engine and it is taking >> a long time to boot the database. I had the customer check the "log" >> directory in the database and there were 62K ".dat" files present. >> >> So I am assuming that these are for transactions that have not >> committed, correct? >=20 > Yes, but they are not cleaned up until a checkpoint has run (by default > that happens when you have 10MB of transaction log), so they may contain > committed transactions too. And the checkpoint will only delete log > files older than the oldest transaction that's still alive. >=20 >> But for the life of me, I cannot figure out what >> transaction could have been in progress and not committed since July >> 12'th. It seems to me this would have exhausted memory or some other >> resource by now. >> >> One other point, an online database backup is done each night by the >> customer. Could this trigger anything like this? >=20 > Yes, an ongoing online backup will prevent deletion of log files, since > it needs them to track modifications that happen while it copies the > database. >=20 > It could also happen if log archiving has been enabled (using the > SYSCS_BACKUP_DATABASE_AND_ENABLE_LOG_ARCHIVE_MODE procedure). You can > tell whether log archiving is enabled by looking for a line that says >=20 > derby.storage.logArchiveMode=3Dtrue >=20 > in the service.properties file in the database directory. >=20 >> Tonight when running >> a utility against the database, the utility failed to acquire locks, >> but there should have been nothing else running but this utility and >> it is single threaded, so there should have been no lock contention. >> It also acts like there is a database backup that is still on going... >=20 > I don't think an online backup needs many locks. If you connect to the > database using ij and execute SELECT * FROM SYSCS_DIAG.LOCK_TABLE you > should see which locks are held, which might give some clues. >=20 >> Right now, I am just waiting for the database to cleanup and boot so >> that I can get in and examine it. Is there any shortcut or express way >> to to boot the database? Is there any way to monitor the progress of >> this boot cleanup? >=20 > I don't know of a way to speed it up. There is a flag that makes debug > builds print more info to derby.log during the recovery phase > (-Dderby.debug.true=3DLogTrace, I think), but it may be too low-level to > get much useful info in this case. >=20 >> Any thoughts or pointers in trying to figure out what is going on will >> be greatly appreciated. >> >> The database in question is Derby 10.5.1 >> >> Brett >=20