db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "RPost" <rp0...@pacbell.net>
Subject Re: Derby architecture/design documents
Date Wed, 02 Feb 2005 00:30:02 GMT
> "Mike Matrigali" wrote:

> 2) I would say derby supports media recovery.  One can make a backup of
>    the data and store it off line.  Logs can be stored on a separate
>    disk from the data, and if you lose your data disk then you can use
>    rollforward recovery on the existing logs and the copy of the backup
>    to bring your database up to the current point in time.
> 3) Derby does not support "point in time recovery".  Someone may want to
>    look at this in the future.  Technically I don't think it would be
>    very hard as the logging system has the stuff to solve the hard
>    problems.  It does not have an idea about "time" - it just knows log
>    sequence numbers, so need to figure out what kind of interface a user
>    really wants.  A very user unfriendly interface would not be very
>    hard to implement which would be recover to a specific log sequence
>    number.  Anyone interested in this feature should add it to jira -
>    I'll be happy to add technical comments on what needs to be done.
> 4) A reasonable next step in derby recovery progress would be to add a
>    way to automatically move/copy log files offline as they are not
>    needed by crash recovery and only needed for media recovery.  Some
>    sort of java stored procedure callout would seem most appropriate.

It sounds like the ability to do "point in time recovery" (3 above) may only
require the log to know about "time" and then perform a "media recovery" (2
above) using the appropriate backup copy and the relevant "time" related
portions of the log.

Is the entire log needed or can compensation and other log entries be
stripped out to leave only the pure committed changes?

Re move/copy log files offline (4 above):

Would it be possible to achieve the equivalent of Oracle's "change data
capture" (CDC) facility with this? CDC allows users to save changes made to
the database. Oracle's version uses a PUBLISH/SUBSCRIBE mechanism but I
think we could use your move/copy log file approach very effectively.

A major issue/problem for many current data warehouse applications is
extracting the "changed" data from source systems so that the changes can be
applied via ETL to the target data warehouse tables. The implementations I
have worked on typically include "audit" columns (created by, created date,
modified by, modified date) on the tables; PeopleSoft uses this approach.
There are major problems finding "changed" data using these columns: 1) the
columns are often not accurate because, without triggers, data can be
updated without updating the audit columns, 2) you need indexes on the audit
columns to find the changed records, 3) audit columns cannot provide
information about deleted rows, 4) audit columns cannot provide information
about "all" changes to "all" columns in any given row but only the last
change to each column made prior to the time data was extracted.

I think Mike's simple approach to move/copy of log data could be easily
modified to add "change data capture" functionality to Derby and put Derby
at the forefront of useability for data warehouse support. This would allow
the log files to be "data mined" to extract only the data of tables of
interest while capturing "all" of the changes for time dimension modeling.

View raw message