db-ojb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Pöschmann" <t.poeschm...@exxcellent.de>
Subject caching enhancements
Date Mon, 14 Apr 2003 17:11:50 GMT
Hi folks,

below I am writing about of my ideas regarding caching -- maybe 
some of you have a comment on this. I would be glad to know.

(thma's comments can be found in [brackets]).



OJB caching enhancements


A common use for a persistence technology (to be more 
specififc: object- relational mapper, ORM) is to open up 
more than one instance of a system inside a Java virtual 
machine. One of such uses can be found, for example, in 
web application. A typical web application servers more 
than one client at a time by providing a multi-threaded 
programming model. If inside such a web application an 
ORM is used, the ORM must be designed to access the 
database with more than one client interface at once. This 
can be compared to JDBC database connections: if a JDBC 
database connection would be shared across multiple users, 
problems occur from transaction demarcation. Inside the 
world of ORMs, an example of this is the design of JDO 
(Java Data Objects): a persistence manager factory can 
create "any" number of client interfaces, the so-called 
persistence managers. These persistence managers (PM) can 
be charaterized as follows:

- Each web application user should be associated with one 
  persistence manager.
- Each is equipped with a (set) of unique database 
  connection(s). If only one database is accessed, each PM 
  has it's own database connection, otherwise one 
  connection per database. Ideally, the persistence manager 
  releases this database connection as often as possible, 
  for example when a transaction ended.
- All persistence managers should share a database 
  connection pool that re-uses database transactions when a 
  PM does not need them anymore, and make them available 
  for other persistence managers.

In order to speed up object retrieval a cache should be used 
to short-down object materialization. A first way of doing 
this is to put these objects inside a map (perferably weakly 
referenced) during a query read. Objects are very often 
accessed by using their ID: either by following object 
references or using a "getObjectById". A simple caching 
solution could now lookup the objects first from the cache 
before going to the database when doing such operations. 
Ideally the persistence technology has already a set of IDs 
available when following object references, but it's not 
always the case.

[thma, regarding last paragraph] all that's given with OJB 
today. [/thma]

A more advanced caching technology could also access the 
cache during query materialization: it could simply read 
the ID from the JDBC driver before constructing the whole 
candidate object. By that it is up to the JDBC driver 
to transport the IDs of the object to the client or all 
columns in a JDBC 
result set row. However, using the cache in this way also 
allows to detect object cycles: imagine a person that has 
addresses, and one of the address refers to an order which 
again is requested by the original person. If a user chooses 
such an instantiation depth, the ORM could load all the 
information at once from the database and materialize the 

[thma] That's the way we use the cache today. Even if the 
user chooses to work without a cache we want to maintain 
a minimal cache to cope with cyclic structures during query 
materialization. [/thma]

Caching and multiple users

In order to be able to use the cache from inside a web 
container, each persistence manager should be equipped 
with it's own cache. 

[thma] We are currently working on a CachePerBroker 
Implementation that will allow this. [/thma]

By that a read operation from one persistence manager does 
not affect the transactional state (and the transactional 
isolation, see term "transaction isolation level" from 
relational databases) of all other persistence managers 
in the same Java virtual machine. The cache should be 
configurable to release all seen objects after a 
transaction or to keep such objects beyond a transaction 
lifecycle. A query must also be configurable to use

- not any cached objects at all
- cached objects seen by this transaction, by this 
  persistence manager
- cached objects seen by this persistence manager, 
  regardless if they were loaded in this transaction.

As a consequence, each cache object has to be equipped 
with an age information that wears the timestamp of the 
transaction that has last used (i.e. materialized or 
read) this object.

Ideally, the cache should be one sub-system of a ORM 
that is transactional aware, i.e. hits transaction 
"begin" and "end" events.

Caching between client interfaces: locking and cloning

Whenever a persistence manager would use an object that was 
seen before by a different persistence manager, this object 
could wear dirty data if simply the reference would be used. 
Check out this example:

2002-12-24 08:34pm      PM1 starts a transaction
2002-12-24 08:35pm      PM1 loads object "reindeer3" from the database, with
                        state "tired"
2002-12-24 08:36pm      PM2 starts a transaction
2002-12-24 08:37pm      PM2 searches for "reindeer3" and gets a reference to
                        the "reindeer2" used by PM1
2002-12-24 08:38pm      PM1 feeds "reindeer3" and changes it's state to
                        "christmas ready"
2002-12-24 08:39pm      PM2 sets the state for "reindeer3" to "sleeping" in
                        order to make sure it is ready for 2002-12-25

You know what? If both PMs would commit now, both would 
update a "sleeping" state for the reindeer. To find a 
way out of this, at least three solutions are open to be 

- Make sure PM2 gets a different copy of reindeer3 - by 
  CLONING the reindeer from the cache 
- Make sure PM2 gets a different copy of reindeer3 - by 
  materializing the reindeer again from the database
- Do not allow using reindeer3 multiple times :(

Please note that a possible conflict when updating reindeer3 
could be resolved by either the database or by the ORM 
whenever the reindeer gets the state "updated" from the 
perspective of the PM.

As a consequence, each cache object must be equipped with an 
age information and there must be a way to determine if this 
object was loaded by a different persistence manager.

There is not yet a way to clone a graph of objects in the Java 
language. However, there are multiple solutions to deal with 
this restriction. One way would be to serialize the object (and 
all objects referenced by it) down a byte array stream, as 
currently used by OJB OTM. This has the disadvantage that 
cloning depth is not configurable, and all classes must wear 
the "serializable" interface.

Another way would be to expose direct access to fields, together 
with a metadata API and the information whether a field has been 
loaded or has not been loaded by the persistence manager (and 
thus results to "null"). I have been talking about this with a 
friend of mine, Carl Rosenberger, and he had some nice ideas
from other of his friends. However, they had always one drawback: 
they require a post-compiler. And the only "post-compiler" that 
is currently "stable" and "accepted" is JDO (Carl will hate me 
for putting the words "stable", "accepted" and "JDO" in one 
sentence, but I cannot care now). The third way finally would 
clone the object using Java reflection, because every object 
can be decomposed to a set of Java native types - which can 
easily be cloned.
A forth way finally would allow the user to implement cloning on 
his own, based on his business object model.

However, cloning is needed if the objects is not always 
materialized from the database. Together with cloning (and 
object graph serialization from a query) goes the feature 
to specify a cloning (or object loading) depth. Whereas 
this is quite easy to implement in Java for cloning, there 
can be "slight" problems when implementing serialization 
depth agains relational databases, as far as (Oracle) 
storeded procedures are not used, and not not every 
materialization level should cause a single SQL statement 
and maybe JOINS are not used - but well, that's a different 

[thma] I'm not a frient of this cloning business. I know the 
JDORI simply use different cache instances for all 
PersistenceManagers. Those caches are strictly separated and 
will be emptied at transaction end.

IMO that's all that is needed.

I know that TopLink got in serious trouble with their cloning 
cache implementation. A lot of user complained. It's not that 
easy to implement. And I don't know if it is worth the effort.
Needs more discussion. [/thma]

One more step back into the future

The ideas described here are only in the scope of a SINGLE 
Java virtual machine. Why don't we bring them to another 
level, by extending them to ANY Java virtual machine in the 
world ;)

[thma] You are thinking about cache synchronization here?
Did you have a look at JCS already? [/thma]

View raw message