jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Torsten Curdt" <tcu...@apache.org>
Subject Re: non-hierarchical content
Date Fri, 19 Dec 2008 14:27:09 GMT
>> So would advice against using the UUIDs in the multivalue field?
>
> You could and you can have referential integrity with it (using the
> REFERENCE property type) - to already answer your question about ref.
> integrity from below. But as JCR does not have something like a "On
> Delete Cascade", you will have to manually remove all the references
> to a target node before you can delete it.

As long as you easily get to them that should be alright :)

> But why do you need referential integrity for tagging? It's not a
> critical thing.

Well, it doesn't have to be critical for one to expect consistency :)

> And imagine you have different rights for tags and content, ie. there
> is some content that a tagging administrator (aka the "librarian")
> cannot access (eg. the personal files of the boss). What if he wants
> to delete a tag (aka also delete all its references on the content)?
> He can't, so you need a solution for a smooth transition anyway (mark
> for delete and delete later when the content owner touches his content
> for example).
>
>> But that is awkward. You will end up with tags like this
>>
>> /tags/oldname/my:title = 'newname'
>>
>> That's confusing more than it helps.
>
> Why? The title could also be something that does not fit into the JCR
> node name constraints, eg. "Apple Pie".

Could - but imagine just this rename

/tags/theveniceproject/my:title = 'The Venice Project'

to

/tags/theveniceproject/my:title = 'Joost'

While surely not a real world example this will be confusing for
anyone who explores the JCR directly. Here the path is not any better
than using an id there. In fact it's even more confusing.

>> While we are on that: As JCR quite a bit like filesystem - are there
>> native ways of doing a "symbolic link"?
>
> JCR 2.0 will introduce "shareable nodes" which can be seen as hard links.

Nice!

> For symbolic links you have to resolve them yourself (but you can use
> the PATH property type to ensure the character constraints for JCR
> paths).

OK

>> And if so - how does that work with versioned resources?
>
> Not sure, but I think it will always reference the current version of
> a target node.

Hm ...so to point to a certain version I am also stuck with using the
UUID and resolve that myself.


>> I think I prefer the lazy show of tags and them prune on the next write maybe.
>> So the "weak reference" model. But lack of repository based
>> referential integrity assurance is quite a turn off.
>
> Have you looked at rule 5 of David's model:
> http://wiki.apache.org/jackrabbit/DavidsModel#head-ed794ec9f4f716b3e53548be6dd91b23e5dd3f3a
>
> It explains why in a JCR you should always try to avoid referential
> integrity, because it will always put stones in the way for all the
> other benefits you get by using JCR.

Indeed I read that - but for my use case I do see that as a problem.
Let's look at the requirements:

1. I want to tag documents in JCR
2. I need to get a cloud of all tags (that is a set of the tags and
the count of usage per tag)
3. I (rarely) need to be able to rename tags
4. I need to be able delete tags

So purely following rule 5 means:

1. store the tags as a multi-value property on the document - using their names
2. query all documents for their tags and build a set/map and count
the occurrences (could be done lazily)
3. query all documents for their tags and replace the tag name for all
matched documents
4. query all documents for their tags and delete the tag for all
matched documents

Now with this all being O(n) operations it's quite obvious that this
cannot scale particular well. You outlined the lesser strict version
(still conforming to rule 5) using paths in this thread that will
scale better but might end up having dangling references.

While the dangling reference are mostly no problem in general, still
you somehow need a janitor to clean up your data to get rid of the
cruft.

Not having this build into JCR itself is a bit of a step back as now
the application developer has to take care of this. So consistency is
no longer a contract of the repository itself.

Whether this is a good or a bad thing probably really depends. Or what
am I missing here?

cheers
--
Torsten

Mime
View raw message