jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Dekany <ddek...@freemail.hu>
Subject Re: Getting "custom" objects from the repository?
Date Sat, 16 Apr 2005 23:41:40 GMT
Saturday, April 16, 2005, 11:35:05 PM, Edgar Poce wrote:

> Hi daniel
>
> Daniel Dekany wrote:
 >> I would be happily build a such framework, but I don't see how... JCR
 >> nodes doesn't even have some kind of automatically maintained
 >> last-modified property that I could use for quickly checking if the
 >> object in the cache is outdated or not. It is almost everything that is
 >> needed for the happiness. Seems to me such a low hanging fruit...
 >>
 >> Node n = (Node) session.getItem("/foo/theTemplate");
 >> cacheEntry = cache.get("/foo/theTemplate");
 >> if (!n.getStamp().equals(cacheEntry.getStamp())) {
 >>     The cached object is outdated, so let's recreate using the
 >>     current value of the insertsomethinghere property.
 >> } else {
 >>     return cacheEntry.getObject()
 >> }
 >>
>
> I guess you can do it by creating a custom node type with a mandatory 
> property that stores the last update timestamp. WDYT?

Not good. Because:

a)

If all modification is made through my CMS (which handlers the
"mapping"), then I can ensure that the mapping:lastUpdate is always
changed when I change a property of the node. But a repository is not
only accessed through the interface of the CMS. It's a central content
repository that is read and write by various tools. So this solution
works for certain users, but it is against the idea of an "enterprise"
content repository, as it wouldn't work there.

b)

I believe that what I'm talking about will turn out to be a very common
task, and I will return to this topic later in this mail. Here I'm just
saying that there should be correct, standard, easy way of doing this.
For example, the specification should introduce nt:monitored (don't deal
with the poor name choice for now...), that should mean that the node
has these properties:

- jcr:uuid
- jcr:modificationCounter which is automatically created and initialized to
  0, and whose value is *automatically* incremented whenever a property of
  the node is written. Yeah, there are lot of unclear things about this,
  it was just a quick starting-point idea.

This would be enough to implement a client side object cache ("mapping")
that I have talked about. If the uuid or modificationCounter of the node
returned for "foo/index" is changed, then the cached object shouldn't be
used. Furthermore I think that there should be a method in the JCR API
for this check, so implementations can optimize this (IMO) frequent
task.

Something like this could be an optional feature. Then it will turn out
if customers will want the JCR implementators to support this feature or
not.

> the first line of your example would be something like:
> Property p = (Property) 
> session.getItem("/foo/theTemplate/mapping:lastUpdate");
>
> But since graffito, lenya and jackrabbit communities are interested in
> such a tool it would be cool to work together, that would be the apache
> way, right? :). AFAIK the proposal discussed in the graffito dev list 
> deals with many of the issues you are talking but I didn't see any 
> reference to a cache with already mapped objects.

Excuse me being a smart Aleck and telling what I think about the whole
issue, and the (apparent?) lack of comprehension about the issue:

People already use these RDBMS-es for ages, and then later they have
built these "object mapping" layers over what he already had, like
Hibernate. Now with JCR it seems that people think that it's OK if they
say that JCR is analogous with JDBC, and since Hibernate is working
relatively well, the same trick (adding an object mapping layer later)
will work in the case JCR as well. I belive its a blind thinking.

Look at Hibernate how and why it works without the caching problems I'm
crying about here. You run a query and you get bean instances (instead
of the low-level ResultSet). Whenever you run a query, new bean
instances are created. There is no caching (and thus no cache that can
go out if sync with the storage). No caching is needed because creating
a new simple bean and setting its properties has no significant resource
consumption compared to what the DB query eats. (And if you worry about
to much garbage, you can use instance pools, because re-setting property
values is easy, so you can reuse instances.) Yes, this surely will work
with JCR as well. The problem is that if we are talking about content
repositories and CMS-es and Web, then I it will turn out that there is a
strong tendency of storing big complex objects in the storage (usually
stored as BLOB-s, that is, binary properties, or as long string
properties), not just simple tables that can be easily modelled with
those lightweight beans. Again, look at a ZODB (Zope) what's stored in
it... lot of objects that you definitely don't want to shamelessly
instantiate and then very soon drop, again and again, like you did with
those beans. For example, you store templates here, and scripts...
creating a template or a script object eats many resources, and not only
because these objects are big and complex, but because creating the
instance may involves tasks like parsing the template/script (written in
Velocity language, Groovy, etc) and such. Same goes for other "serious"
objects as well, like XML documents (you will store it with a string or
binary property, but certainly you will have to parse that to DOM tree
before you can actually use it). A content repository is a bit like the
file system on your HDD: you store many big/complex files there (lot of
BLOB-s with RDBMS wording), not just a heap of simple tables.

Furthermore, since these objects are relatively big in the storage, just
to get these objects from the content repository is expensive in itself
because of the I/O needed (and/or because of the memory needed if they
are in the RAM cache). Rather you should just get the values of jcr:uuid
and jcr:modificationCounter (see these earlier), and get the primary
property (typically jcr:content) only if they have changed.

Last not least, these objects in the repository are usually seldom
changed (consider how often you modify page templates compared to the
records that store the current stocks), so it is just obvious that you
should cache them inside your CMS, or inside whatever that is the client
to the content repository. These objects are used frequently (like you
will run the page template of a frequently visited page a lot), and
modified seldom.

-- 
Best regards,
 Daniel Dekany


Mime
View raw message