jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Torsten Curdt" <tcu...@apache.org>
Subject Re: non-hierarchical content
Date Fri, 19 Dec 2008 18:04:25 GMT
>> As long as you easily get to them that should be alright :)
> Node.getReferences()

That's is indeed easy :)

>> Well, it doesn't have to be critical for one to expect consistency :)
> My lazy delete/rename approach is not creating inconsistent data.

Well, not as long as you handle the dangling references in your
application properly. Which was my point.

>> imagine just this rename
>> /tags/theveniceproject/my:title = 'The Venice Project'
>> to
>> /tags/theveniceproject/my:title = 'Joost'
>> While surely not a real world example this will be confusing for
>> anyone who explores the JCR directly. Here the path is not any better
>> than using an id there. In fact it's even more confusing.
> No, even your example is not confusing, since both names relate to the
> same entity in the end (given one knows both names for Joost). Apart
> from that, with my title approach the user would never see the node
> name, but always the title.

When the application resolves the title - sure.
But the point of rule 5 was that the path itself is more expressive like that.

> And for someone who explores the
> repository (developer or administrator) the path
> /tags/theveniceproject makes more sense than a (UU)ID.

...well only when he knows that 'Joost' was 'The Venice Project'
before. Otherwise there is no difference to a (UU)ID at all.

If someone wants to "delete node 'blue'" then the path is not good
enough. You will have check the titles. Because it could well be that
there exists:

/tags/blue/my:title = 'green'
/tags/green/my:title = 'blue'

While this should not happen I have seen too many stupid things and
would not rule this out just like that.

So access through the application is fine. But IMO rule 5 does not
really work well with the tagging example.

>> Hm ...so to point to a certain version I am also stuck with using the
>> UUID and resolve that myself.
> Are you sure the relation you have is automatically "lost" with each
> new version of the content? I am not denying that such a use-case
> might exist, but IMHO its a 1 % case.

No - not lost. But imagine an article history like this:

version2 (published)
version3 (draft)

So in order to find the published content you would have to lookup the
published UUID (or version). In pseudo code:

published = /articles/article1/@published
publishedArticle = /articles/article1[version == $published]

...or duplicate the content for the publication.

>> Now with this all being O(n) operations it's quite obvious that this
>> cannot scale particular well. You outlined the lesser strict version
>> (still conforming to rule 5) using paths in this thread that will
>> scale better but might end up having dangling references.
> But these dangling references are not a problem if it is a rule that
> they count as non-existent if they cannot be resolved (because the
> target node no longer exists). That's what I meant with using that for
> displaying the title anyway.

Right - but my point was that you have to express all that in your
application's code.

>> While the dangling reference are mostly no problem in general, still
>> you somehow need a janitor to clean up your data to get rid of the
>> cruft.
>> Not having this build into JCR itself is a bit of a step back as now
>> the application developer has to take care of this. So consistency is
>> no longer a contract of the repository itself.
> Even RDMS cannot guarantee 100% integrity, so dangling references are
> something that an application should expect.

Huh? That's what foreign key constraints are for. Those things you
hate when you just want to get rid of this one row and have to touch
table after table ;)

> An application that
> expects corrupt data and can handle it will have more up-time than one
> that completely breaks in that situation.

True ... I still think it would be nice to at least be able to have
this somehow be integrated into JCR as well. Not necessarily as
constraints like with databases. But give it some time and you will
want to cleanup the data mess that has piled up. Would be nice if
there was a standard way of cleaning that up. Maybe even just lazily.


View raw message