db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: Question on log directory of a derby database
Date Tue, 19 Jul 2011 19:58:32 GMT
Do you change the size of the Derby buffer cache?  One problem Derby can 
have at these performance levels is that the checkpoint has to compete 
with I/O from the regular transactions as your load increases this just
becomes more and more of a problem.

62k files does seem crazy, and I have seen OS problems in the past
with performance just dealing with large number of files in a directory.
It may help to get some key stats about your performance level.  For
log it would be useful to know bytes/second that your application is
generating.  Hopefully you can just look at times on the log files and
get a quick estimate.

I don't have enough information but 4.2 million inserts a day seems
aggressive and may need some hardware tuning to get Derby to handle it
well.  You should look at how much you are logging per second and 
compare that with expected I/O on your log device.  Derby does not
have many tools to help with I/O performance, it mostly counts on
the OS to provide those.  So to get write performance it may help
to stripe the disk by the OS.  Derby does allow you to split Data I/O
and log I/O across multiple disks.

Bergquist, Brett wrote:
> Good idea on the backup strategy!  
> The one problem that I have is that I have no access to the customer's system and it
is pretty much running hands off.  I do have to say that we have been using Derby for about
5 years and have never had a problem up until these last couple of weeks.  So when it is working
correctly, nobody is watching ;) 
> Nothing in the production environment has changed other than the volume of data.
> One of the main functions is to collect network performance test results.  Our equipment
generates network performance tests continuously and over the last two months have seen growth
from about 1.3M/day to 4.6M/day. 
> I am going to make an improvement in the Method Of Procedure used for the backup strategy
to incorporate the consistency checking and boot verification.
> -----Original Message-----
> From: Mike Matrigali [mailto:mikem_app@sbcglobal.net] 
> Sent: Tuesday, July 19, 2011 2:31 PM
> To: derby-dev@db.apache.org
> Subject: Re: Question on log directory of a derby database
> I believe the consistency checker is going to lock each table for the 
> duration of the checking.  The time it takes is dependent on size of 
> table.
> When possible post a directory listing of the files in log directory 
> with the following if possible:  size of each file, date modified, and 
> name of file.  On unix tools this is just a "ls -l".  Doing this for
> the backup would help also.
> I do suggest you look at your backup strategy.  We suggest that all 
> backups should be verified after taking them.  So at least they should
> be booted after making them to verify they have no errors.  Also useful
> is to run the consistency checker on them.  This can be done on a 
> different machine if necessary, Derby databases are portable across
> devices/machines.
> Bergquist, Brett wrote:
>> I did get a sampling of the transaction logs over the days.   The oldest ones were
July 12'th and the newest ones were July 18'th.  I had the sysAdmin copy some of these aside
so I can get a sample of every couple of days.
>> Will the utility that you mention look at specific transaction log files?  I think
I saw the utility mentioned in a JIRA when I was searching last night so I can probably find
it myself.
>> -----Original Message-----
>> From: Mike Matrigali [mailto:mikem_app@sbcglobal.net] 
>> Sent: Tuesday, July 19, 2011 1:10 PM
>> To: derby-dev@db.apache.org
>> Subject: Re: Question on log directory of a derby database
>> It is likely the database is corrupt now that you booted without letting 
>> recovery do its job.  Just because there are no applications accessing
>> the db it does not mean it is safe to delete the log files.  Derby 
>> maintains an in memory cache of data that along with info in the 
>> recovery log files works to produce a consistent database.  The only 
>> time all the data is consistent is after shutting down cleanly with
>> "shutdown=true" giving the system a chance to write out all it's dirty
>> cached data (basically it does a checkpoint before shutting down).
>> You should at least run the consistency checker across all tables to 
>> catch obvious corruptions, but even if this passes there may still be 
>> problems with
>> the database.
>> Did you happen to keep a copy of the problem db with its log files 
>> before corrupting it by
>> deleting it's log files?  There have been some tools posted to JIRA that 
>> can allow looking at what is going on in the log files.  If not did you 
>> keep a listing of the files.
>> It would be interesting to at least see the names and sizes of the files
>> in the log directory.
>> Do you know if the customer uses 2 phase commit at all?  This is a 
>> really easy way to keep a very old transaction in the log file.  These 
>> can stay around after clients crash or exit if not properly committed or
>> aborted.
>> Bergquist, Brett wrote:
>>> Thanks for taking the time to respond Knut.  It is much appreciated.
>>> Some information:
>>> The log files total 64Gb of disk space.  So this clearly went way past the 10Mb
of transaction log.
>>> So the " And the checkpoint will only delete log files older than the oldest
transaction that's still alive."  That is important I think.   So if there was a stuck transaction
somehow that occurred on July 12, for example, then this could in theory keep the transaction
logs around until last night, correct?
>>> Unfortunately the system administrator had already shutdown the database engine
before he called me.  It would not boot the database in a reasonable time.  I was looking
at the iostat and it looked like it was doing about 1Mbs and I used truss to take a look at
the system calls and it was processing transaction log files from July 13'th after quite a
while (a couple of hours of trying to boot the database).  I did a quick calculation and it
looked like it would take somewhere around 17 hours to boot the database.
>>> I looked at the last online backup that the customer had made and again, it had
many thousands of transaction logs in the backup database, so that was not useful either.

>>> I only had one option.  I knew the system was in a quiet state as there was no
applications accessing the database.  I know this is not recommended but I had no choice but
to remove the old transaction log files and boot the database.  It came up okay and is in
operation okay so I think it will be alright but it could possibly have corruption.  I had
to take the chance however.
>>> I am going to monitor the system and use the syscs_diag.transaction_table to
query the transactions if I see this happen again.  Just a note however, something similar
did happen a week ago and I looks at the transactions and it showed none even though there
were thousands of transaction log files around.  So a question, does the online backup show
up as a transaction in the syscs_diag.transaction_table?  Also, a week ago, there was no locks
as reported by the syscs_diag.lock_table (at least the snapshot of querying this table that
I looked at).
>>> Again if there is anything that anyone can think of that I should look at if
I see this happen again, please shout out.
>>> Brett
>>> -----Original Message-----
>>> From: Knut Anders Hatlen [mailto:knut.hatlen@oracle.com] 
>>> Sent: Tuesday, July 19, 2011 9:41 AM
>>> To: derby-dev@db.apache.org
>>> Subject: Re: Question on log directory of a derby database
>>> "Bergquist, Brett" <BBergquist@canoga.com> writes:
>>>> I have a database in production that has been running fine for a few
>>>> years. It started out having about 100K inserts per day into it and
>>>> now is up to about 4.6M inserts per day and this has been working
>>>> fine.
>>>> Tonight the customer called because the system was chewing up disk
>>>> space. I had the customer restart the database engine and it is taking
>>>> a long time to boot the database. I had the customer check the "log"
>>>> directory in the database and there were 62K ".dat" files present.
>>>> So I am assuming that these are for transactions that have not
>>>> committed, correct?
>>> Yes, but they are not cleaned up until a checkpoint has run (by default
>>> that happens when you have 10MB of transaction log), so they may contain
>>> committed transactions too. And the checkpoint will only delete log
>>> files older than the oldest transaction that's still alive.
>>>> But for the life of me, I cannot figure out what
>>>> transaction could have been in progress and not committed since July
>>>> 12'th. It seems to me this would have exhausted memory or some other
>>>> resource by now.
>>>> One other point, an online database backup is done each night by the
>>>> customer. Could this trigger anything like this?
>>> Yes, an ongoing online backup will prevent deletion of log files, since
>>> it needs them to track modifications that happen while it copies the
>>> database.
>>> It could also happen if log archiving has been enabled (using the
>>> tell whether log archiving is enabled by looking for a line that says
>>>   derby.storage.logArchiveMode=true
>>> in the service.properties file in the database directory.
>>>> Tonight when running
>>>> a utility against the database, the utility failed to acquire locks,
>>>> but there should have been nothing else running but this utility and
>>>> it is single threaded, so there should have been no lock contention.
>>>> It also acts like there is a database backup that is still on going...
>>> I don't think an online backup needs many locks. If you connect to the
>>> database using ij and execute SELECT * FROM SYSCS_DIAG.LOCK_TABLE you
>>> should see which locks are held, which might give some clues.
>>>> Right now, I am just waiting for the database to cleanup and boot so
>>>> that I can get in and examine it. Is there any shortcut or express way
>>>> to to boot the database? Is there any way to monitor the progress of
>>>> this boot cleanup?
>>> I don't know of a way to speed it up. There is a flag that makes debug
>>> builds print more info to derby.log during the recovery phase
>>> (-Dderby.debug.true=LogTrace, I think), but it may be too low-level to
>>> get much useful info in this case.
>>>> Any thoughts or pointers in trying to figure out what is going on will
>>>> be greatly appreciated.
>>>> The database in question is Derby 10.5.1
>>>> Brett

View raw message