jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting" <jukka.zitt...@gmail.com>
Subject Re: DM Rule #5: References considered harmful.
Date Wed, 11 Jul 2007 09:22:33 GMT

On 7/7/07, David Nuescheler <david.nuescheler@gmail.com> wrote:
> References imply referential integrity. I find it important to
> understand that references do not just add additional cost for the
> repository managing the referential integrity, but they also are
> costly from a content flexibility perspective.
> Personally I make sure I only ever use references when I really cannot
> deal with a dangling reference and otherwise use a path, a name or a
> string UUID to refer to another node.

Since there are cases where references are really useful, I would
rather label this rule as "Moderate use of references" than implying
that all references are harmful.

I would actually argue that similar flexibility costs arise regardless
of whether you use hard or soft references. The main problem is that
you are going beyond the native hierarchy model, and this will always
have inherent costs. In fact, when you really need to do that, using
hard references is IMHO an acceptable and good solution.

However, here are my rules of thumb when dealing with references (be
they hard references, paths, UUID strings, or even just naming
conventions) between separate subtrees:

1) Don't do that. If there's a way to make all your content
hierarchical, use it. This applies for example to the blog post vs.
comments example in David's example #1.

2) If there's an alternative, use it. If there already is some natural
unique identifier in the target node (usernames are typical examples),
then it makes sense to use that instead of a reference or path
property (for example you want to store the username of a comment
author instead of the UUID or path of a user node).

3) When no alternatives exist, use references just in one direction.
If I have two subtrees A and B whose content I need to interlink, then
it makes sense to have all references going from A to B or vice versa
but not in both directions. This makes it possible to manage (backup,
migrate, etc.) at least the other subtree without worrying about the
other one. Note that this is how the version store works in JCR, the
version history references from normal content are always directed to
/jcr:system/jcr:versionStorage, but the version store has no
references back to normal content.

4) If you really need references in both directions between two
subtrees, manage those subtees as a part of a larger tree. If two
subtrees are hopelessly intertwined (and you can't find a way to merge
them), then it's better if you don't think of them as separate trees
but rather as branches of a bigger tree. This way you can still backup
and migrate the whole tree without problems due to dangling


Jukka Zitting

View raw message