jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander Klimetschek" <aklim...@day.com>
Subject Re: non-hierarchical content
Date Fri, 19 Dec 2008 16:27:40 GMT
On Fri, Dec 19, 2008 at 3:27 PM, Torsten Curdt <tcurdt@apache.org> wrote:
>> You could and you can have referential integrity with it (using the
>> REFERENCE property type) - to already answer your question about ref.
>> integrity from below. But as JCR does not have something like a "On
>> Delete Cascade", you will have to manually remove all the references
>> to a target node before you can delete it.
> As long as you easily get to them that should be alright :)


>> But why do you need referential integrity for tagging? It's not a
>> critical thing.
> Well, it doesn't have to be critical for one to expect consistency :)

My lazy delete/rename approach is not creating inconsistent data.

>> Why? The title could also be something that does not fit into the JCR
>> node name constraints, eg. "Apple Pie".
> Could - but imagine just this rename
> /tags/theveniceproject/my:title = 'The Venice Project'
> to
> /tags/theveniceproject/my:title = 'Joost'
> While surely not a real world example this will be confusing for
> anyone who explores the JCR directly. Here the path is not any better
> than using an id there. In fact it's even more confusing.

No, even your example is not confusing, since both names relate to the
same entity in the end (given one knows both names for Joost). Apart
from that, with my title approach the user would never see the node
name, but always the title. And for someone who explores the
repository (developer or administrator) the path
/tags/theveniceproject makes more sense than a (UU)ID.

> Hm ...so to point to a certain version I am also stuck with using the
> UUID and resolve that myself.

Are you sure the relation you have is automatically "lost" with each
new version of the content? I am not denying that such a use-case
might exist, but IMHO its a 1 % case.

> So purely following rule 5 means:
> 1. store the tags as a multi-value property on the document - using their names
> 2. query all documents for their tags and build a set/map and count
> the occurrences (could be done lazily)
> 3. query all documents for their tags and replace the tag name for all
> matched documents
> 4. query all documents for their tags and delete the tag for all
> matched documents
> Now with this all being O(n) operations it's quite obvious that this
> cannot scale particular well. You outlined the lesser strict version
> (still conforming to rule 5) using paths in this thread that will
> scale better but might end up having dangling references.

But these dangling references are not a problem if it is a rule that
they count as non-existent if they cannot be resolved (because the
target node no longer exists). That's what I meant with using that for
displaying the title anyway.

> While the dangling reference are mostly no problem in general, still
> you somehow need a janitor to clean up your data to get rid of the
> cruft.
> Not having this build into JCR itself is a bit of a step back as now
> the application developer has to take care of this. So consistency is
> no longer a contract of the repository itself.

Even RDMS cannot guarantee 100% integrity, so dangling references are
something that an application should expect. An application that
expects corrupt data and can handle it will have more up-time than one
that completely breaks in that situation.


Alexander Klimetschek

View raw message