jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alessandro Bologna <alessandro.bolo...@gmail.com>
Subject Re: importing jackrabbit into jackrabbit
Date Thu, 26 Apr 2007 15:18:58 GMT
I think that the main problem is not really about the specific case,  
but in general that when people design relational databases, they  
always use references (or more properly, joins) to define data that  
belongs logically to many entities, but should not duplicated.

Imagine that you have a company tree, with "positions",  
"departments", "employees", "health plans" etc.
An employee could belong to a department, have a position and an  
health plan, but typically you would not make all those nodes child  
nodes of the employee: you would instead define references to the  
proper node in the "position" and "health plan" subtrees.
It's easy to see how, in a large company, there could be thousands of  
employee holding the same position and health plan, and those  
specific nodes ("Secretary" and "Plan A")  would have thousand of  
references pointing to them.
So, given the issue  as explained by Marcel that "whenever a  
reference is added that points to a node N the complete set of  
references pointing to N is re-written to the persistence manager",  
it seems that using references to a node that is very "popular" is  
really going to be creating problems in the long term.

What could be the right way to model things? Maybe using a "path"  
property to point to the node instead? Of course, it would not be as  
easy to use as a reference, and it would be requiring global updates  
if the pointed node ever change position, but I can't see other options.

Any suggestions?

Alessandro Bologna


On Apr 26, 2007, at 2:38 PM, Jukka Zitting wrote:

> Hi,
>
> On 4/26/07, Stefan Kurla <stefan.kurla@gmail.com> wrote:
>> I would appreciate the thoughts on references though. Reason being
>> that one of the biggest strengths of JSR-170 is the ability to store
>> references. I imagine a situation where i could have a nodetype call
>> docType which is either pdf or word strings. Say 80% of my documents
>> are word documents. Then the docType will have a reference to 80% of
>> all documents in my repository. If my repository is 100,000 files  
>> then
>> docType references 80,000 nodes.
>>
>> If what you say is correct that at every new reference, the complete
>> set of references are rewritten, then obviously this is a bottleneck.
>>
>> Should such a situation be avoided?
>
> Why would you need to use such references structure? I would rather
> use the node types to model such information. A search query like
> //element(*,my:wordDocument) will efficiently return you all such Word
> documents in your workspace.
>
> BR,
>
> Jukka Zitting


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message