jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rami Ojares <rami.oja...@gmail.com>
Subject Re: A generic question about JackRabbit
Date Fri, 26 Mar 2010 22:16:25 GMT
Thank you very much for all your answers.
I think I got the overall flavour of how people see the data management 
issues on this list.
But Alex said a few radical comments so I just can't refrain myself from 
commenting them a little.

> I think normalization is often thought to be absolutely fundamental
> for any data schema, because it is a central part of RDBMS and this is
> basically the only thing that was taught the last 20-30 years (went
> this route myself). But a major reason for normalization was simply
> the space constraint, it's not fundamental at all.

Data normalization is concerned with issues of how the relational model 
should be used.
If you don't have the data in 1st normal form then the relational 
operators and other tools
that the relational model provides are unable to query the data.
(Putting a comma separated list as a value of an attribute makes it very 
hard to formulate a query
"Give me rows where one of the list's components is smaller than X")

2NF gives guidance on the meaning of a canditate key in a relation
And so forth.

The point being that normalization could never have been about saving 
space because it
is concerned with how you should use the relational model so you don't 
get redundant
data in your model that is "out-of-sync" with other data in your system.

In fact it is not concerned at all with the issue of how much disk space 
is used because that
is left completely to the IMPLEMENTATION of the relational model to decide.

Things like materialized views have nothing to do with the relational 
model. They are just an
example that sql is used also to guide the implementation of how data is 

I can easily imagine a RDBMS that implements all views as materialized 
into let's say RAM, disk or
a network of 10 million servers. But from the user's (application's) 
point of view this is of no concern.

But what does concern the user is: Do I get the correct value when I 
query it? Can I update this piece of data
without making the whole system inconsistent.

I think people rarely make the difference between data MODEL and it's 
various IMPLEMENTATION strategies.
Most of the time people are only concerned with the implementation and 
the data model is just a consequence
of how things were implemented. This is how I think the situation is 
with JackRabbit.

But before you start crucifying me let me just say that I don't see 
anything wrong with this approach.
It is just a bit too ad hoc for my own state of mind.

> The same story with the ACID constraints... banking accounts are not
> the only software application nowadays ;-)

Wow, that came out of the bush and hit me right there in the back 
between my shoulder blades ;-)

Basically what you are saying is that when you update data in your 
storage system (that could any data model)
you don't really care if only part of the data you sent there got 
updated (Atomicity)
You don't care that if you have set some rules for your data that these 
rules are respected. (Consistency)
If many threads are accessing the data it does not concern you if they 
see each other's updates
partially and modify each others data while their operations are 
underway. (Isolation)
And certainly you don't give a toss about whether your data really stays 
in the storage once you have put it there. (Durability)

The reason why acid properties are related to banking industry is that 
people demand correctness when banks deal with their money.
Otherwise they don't give their money to the banks.
I have a feeling that other kind of software could also benefit from 
But somehow the sql based databases have done a poor job delivering on 
this promise.

According to me the reason is twofold.
1) Poor sql database implementations
2) Users who have been using these implementations poorly

Once more ... before you start calling me a pedantic and overscrupulous 
let me just say that you can let go
of many of these checks if you know how your data is updated and you 
have full control of how the storage is used.

But in the generic case I don't think so.

Summa summarum: I could take JackRabbit and use it as my implementation 
of the relational model or let's say implementation
of an XML database (yes there used to be that kind of "hot stuff" on the 
market a few years back :).
The data model of JackRabbit is the JCR api (if you can call a java api 
a data model).
And the project's deliverables are it's implementation(s).

- rami

View raw message