cayenne-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "james" <...@jensenplanet.com>
Subject Temporal Database (Almost)
Date Sat, 04 Aug 2007 20:42:08 GMT
I have seen several discussions on implementing temporal databases. I am working on an application
that needs something very similar, but slightly different from what I have seen discussed
on the list. I would like a "repository" database that allows me to provide versioning and
allow collaboration. I would like to structure it conceptually similarly to SVN or CVS. I
would like to use some of Cayenne's abilities to accomplish this.



If anyone has some spare cycles to help, I am looking to hire a qualified Cayenne developer
for some part-time work. Follow this link to job posting 266326 on Odesk.com. http://www.odesk.com/console/j/opening.php?key=8bd7d42fda97bfde&tot=1&pos=0

(By the way, I think Odesk does a nice job providing a good contracting environment and have
had a good experience so far.)




If you are just curious about what I am thinking, I have included a long-winded explanation
of my initial ideas. If you are at all interested in this or have tried similar things, I
would like to know what you may think.

James Jensen


... ... ... ... ...

(Understand that this is a first pass, and is intended as a starting point)


The databases that make up this system should behave similar to a repository such as SVN or
CVS. There is one "central" repository database, and multiple client databases. A client can
only connect to one repository, and a reasonable mechanism should be in place to prevent normal
users from connecting a client to a different repository than the one for which it was originally
created. 




All modifications by users are made to the client database. There will usually be one per
laptop/user, although occasionally there could be more than one user. Each client application
should have a password login process that identifies the user, and selects the correct local
client db that is used for that user. (The application should also have a roll based security
system to restrict/allow specific activities, like sales, installer, scheduler, admin., etc)
For each workstation, a user must have a separate user db. I.e. Each client is identified
by user and workstation id. When a user logs into a new workstation, that user must create
a new client from the repository, and cannot directly access data on a different workstation
that has not been synced. Information never flows directly from client to client.




Modifications to the repository are made by a client "syncing" with the repository. Like SVN,
the client must first update any and all repository changes back to the local client by merging
and choosing all changes. When all merges are complete and client has updated to the latest
repository version, it is allowed to commit its changes to the repository. The commit process
should be atomic, and the first one wins. I.e. If two commits from different clients start
at the same time, the first to get the lock, checks repository version, does its update, which
causes the repository version to advance, the second client's commit will fail because of
the version mismatch, so the client will need to update again and then attempt to commit again.




The repository is intended to be a fairly passive part of the system. It consists of a database
and some locking mechanism to prevent concurrent access. It is intended that the repository
be some sort of web service (Like Cayenne makes possible). Connections from client to repository
being made via web. The repository is only really visible through the clients. All application
logic is managed by the client.




Versioning: 




The repository behavior is enabled by adding a few extra columns to most tables in the database,
in addition to the ID column. Rows will be fully identified by ID, BIRTH_VER, RETIRED_VER,
and CLIENT_ID. (Assuming Cayenne is used as the ORM, ID will actually be created by some custom
code to combine the standard auto-increment ID plus the CLIENT_ID added as some upper bits
to ensure that ID is always unique across the repository without requiring that the CLIENT_ID
column always be used for all queries.)




The "current" version is a concept that applies to the entire database. Each table does not
maintain it's own notion of current version. Even and odd versions are used to differentiate
between synced data and client modified data. Even indicates data synced with the repository.
Odd versions have been modified since the last sync.




When a record is created, BIRTH_VER is set to current version (some odd client ver) and RETIRED_VER
is set to max-value or some other really big value called NOT_RETIRED that logically represents
"beyond" the current version. 




When a record is deleted, a check is made to see if that retired version already exists, if
it does, then the deleted data is just copied over the data with the same retired version.
If a retired record doesn't exist yet with that version, retired is set to current version
. When a record is modified, the original record is treated like a delete (some odd number),
and the record is cloned with same id, with the death set to NOT_RETIRED. The birth could
be set to current, but it would work even if it were not, queries would just be harder.



It is possible to "save" a version by simply incrementing the current version number to the
next odd number, thus protecting all existing retired data from further changes and a new
layer of history has been started. Also queries for the current snap shot just have to test
for retired == NOT_RETIRED. 




When a client is updated. It uses its last updated even version as a base line and looks for
any newer versions. If they exist, entries from tables of that version are copied to the client
(maybe to tmp tables). Merging is not as simple as text merges, and requires special code
that takes into account the specific tables being merged and presents appropriate choices
to the user. As pieces of the merge are made they are give a new version number. When a successful
set of merge data has been chosen/created, then the data is accepted as a new odd version
and the last updated version is changed to the even version of the repository. To summarize,
an update consists of first downloading a tmp version of latest modified records from the
latest repository state, attempt to merge, if a successful merge is created, it can be accepted,
or the merge process can just be rolled back a version and thrown away.




When a client wants to commit, if the "updated" version matches and no new odd versions have
been created, then the client data is copied over to the repository, but with any odd version
numbers moved to the "next" even version. I.e. Only records with even versions makeup the
"official" repository. By committing, a version is being saved in the repository. (The original
odd version records can also be copied to the repository and replicated to other clients if
it is desired to give clients the opportunity to see all the versions used to merge into the
repository. However, this expands the amount of data stored even further.)



Thanks for your interest and patience if you made it this far.


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message