Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm
Precedence: bulk
Reply-To: <derby-dev@db.apache.org>
Received-SPF: neutral (athena.apache.org: local policy)
From: "Bergquist, Brett" <BBergquist@canoga.com>
To: "derby-dev@db.apache.org" <derby-dev@db.apache.org>
Date: Tue, 19 Jul 2011 10:37:29 -0700
Subject: RE: Question on log directory of a derby database
Thread-Topic: Question on log directory of a derby database
Thread-Index: AcxGNvYuGC48yPTsT2mawGA0WPhUOAAA1H8w
Message-ID: <97EB699F861AD841B5908C7CA9C9565601609B5EA1F8@VSERVER1.canoga.com>
References: <97EB699F861AD841B5908C7CA9C9565601607A6B0DC6@VSERVER1.canoga.com>
 <wjo8pql6fkx7.fsf@oracle.com>
 <97EB699F861AD841B5908C7CA9C9565601609B5EA1A2@VSERVER1.canoga.com>
 <4E25BA64.2040007@sbcglobal.net>
In-Reply-To: <4E25BA64.2040007@sbcglobal.net>
Accept-Language: en-US
Content-Language: en-US
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

I did get a sampling of the transaction logs over the days.   The oldest on=
es were July 12'th and the newest ones were July 18'th.  I had the sysAdmin=
 copy some of these aside so I can get a sample of every couple of days.

Will the utility that you mention look at specific transaction log files?  =
I think I saw the utility mentioned in a JIRA when I was searching last nig=
ht so I can probably find it myself.

-----Original Message-----
From: Mike Matrigali [mailto:mikem_app@sbcglobal.net]=20
Sent: Tuesday, July 19, 2011 1:10 PM
To: derby-dev@db.apache.org
Subject: Re: Question on log directory of a derby database

It is likely the database is corrupt now that you booted without letting=20
recovery do its job.  Just because there are no applications accessing
the db it does not mean it is safe to delete the log files.  Derby=20
maintains an in memory cache of data that along with info in the=20
recovery log files works to produce a consistent database.  The only=20
time all the data is consistent is after shutting down cleanly with
"shutdown=3Dtrue" giving the system a chance to write out all it's dirty
cached data (basically it does a checkpoint before shutting down).

You should at least run the consistency checker across all tables to=20
catch obvious corruptions, but even if this passes there may still be=20
problems with
the database.

Did you happen to keep a copy of the problem db with its log files=20
before corrupting it by
deleting it's log files?  There have been some tools posted to JIRA that=20
can allow looking at what is going on in the log files.  If not did you=20
keep a listing of the files.
It would be interesting to at least see the names and sizes of the files
in the log directory.

Do you know if the customer uses 2 phase commit at all?  This is a=20
really easy way to keep a very old transaction in the log file.  These=20
can stay around after clients crash or exit if not properly committed or
aborted.


Bergquist, Brett wrote:
> Thanks for taking the time to respond Knut.  It is much appreciated.
>=20
> Some information:
>=20
> The log files total 64Gb of disk space.  So this clearly went way past th=
e 10Mb of transaction log.
>=20
> So the " And the checkpoint will only delete log files older than the old=
est transaction that's still alive."  That is important I think.   So if th=
ere was a stuck transaction somehow that occurred on July 12, for example, =
then this could in theory keep the transaction logs around until last night=
, correct?
>=20
> Unfortunately the system administrator had already shutdown the database =
engine before he called me.  It would not boot the database in a reasonable=
 time.  I was looking at the iostat and it looked like it was doing about 1=
Mbs and I used truss to take a look at the system calls and it was processi=
ng transaction log files from July 13'th after quite a while (a couple of h=
ours of trying to boot the database).  I did a quick calculation and it loo=
ked like it would take somewhere around 17 hours to boot the database.
>=20
> I looked at the last online backup that the customer had made and again, =
it had many thousands of transaction logs in the backup database, so that w=
as not useful either.=20
>=20
> I only had one option.  I knew the system was in a quiet state as there w=
as no applications accessing the database.  I know this is not recommended =
but I had no choice but to remove the old transaction log files and boot th=
e database.  It came up okay and is in operation okay so I think it will be=
 alright but it could possibly have corruption.  I had to take the chance h=
owever.
>=20
> I am going to monitor the system and use the syscs_diag.transaction_table=
 to query the transactions if I see this happen again.  Just a note however=
, something similar did happen a week ago and I looks at the transactions a=
nd it showed none even though there were thousands of transaction log files=
 around.  So a question, does the online backup show up as a transaction in=
 the syscs_diag.transaction_table?  Also, a week ago, there was no locks as=
 reported by the syscs_diag.lock_table (at least the snapshot of querying t=
his table that I looked at).
>=20
> Again if there is anything that anyone can think of that I should look at=
 if I see this happen again, please shout out.
>=20
> Brett
>=20
> -----Original Message-----
> From: Knut Anders Hatlen [mailto:knut.hatlen@oracle.com]=20
> Sent: Tuesday, July 19, 2011 9:41 AM
> To: derby-dev@db.apache.org
> Subject: Re: Question on log directory of a derby database
>=20
> "Bergquist, Brett" <BBergquist@canoga.com> writes:
>=20
>> I have a database in production that has been running fine for a few
>> years. It started out having about 100K inserts per day into it and
>> now is up to about 4.6M inserts per day and this has been working
>> fine.
>>
>> Tonight the customer called because the system was chewing up disk
>> space. I had the customer restart the database engine and it is taking
>> a long time to boot the database. I had the customer check the "log"
>> directory in the database and there were 62K ".dat" files present.
>>
>> So I am assuming that these are for transactions that have not
>> committed, correct?
>=20
> Yes, but they are not cleaned up until a checkpoint has run (by default
> that happens when you have 10MB of transaction log), so they may contain
> committed transactions too. And the checkpoint will only delete log
> files older than the oldest transaction that's still alive.
>=20
>> But for the life of me, I cannot figure out what
>> transaction could have been in progress and not committed since July
>> 12'th. It seems to me this would have exhausted memory or some other
>> resource by now.
>>
>> One other point, an online database backup is done each night by the
>> customer. Could this trigger anything like this?
>=20
> Yes, an ongoing online backup will prevent deletion of log files, since
> it needs them to track modifications that happen while it copies the
> database.
>=20
> It could also happen if log archiving has been enabled (using the
> SYSCS_BACKUP_DATABASE_AND_ENABLE_LOG_ARCHIVE_MODE procedure). You can
> tell whether log archiving is enabled by looking for a line that says
>=20
>   derby.storage.logArchiveMode=3Dtrue
>=20
> in the service.properties file in the database directory.
>=20
>> Tonight when running
>> a utility against the database, the utility failed to acquire locks,
>> but there should have been nothing else running but this utility and
>> it is single threaded, so there should have been no lock contention.
>> It also acts like there is a database backup that is still on going...
>=20
> I don't think an online backup needs many locks. If you connect to the
> database using ij and execute SELECT * FROM SYSCS_DIAG.LOCK_TABLE you
> should see which locks are held, which might give some clues.
>=20
>> Right now, I am just waiting for the database to cleanup and boot so
>> that I can get in and examine it. Is there any shortcut or express way
>> to to boot the database? Is there any way to monitor the progress of
>> this boot cleanup?
>=20
> I don't know of a way to speed it up. There is a flag that makes debug
> builds print more info to derby.log during the recovery phase
> (-Dderby.debug.true=3DLogTrace, I think), but it may be too low-level to
> get much useful info in this case.
>=20
>> Any thoughts or pointers in trying to figure out what is going on will
>> be greatly appreciated.
>>
>> The database in question is Derby 10.5.1
>>
>> Brett
>=20