jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edgar Poce <edgarp...@gmail.com>
Subject Re: Is JDBC persistence manager supported by jackrabbit?
Date Thu, 01 Sep 2005 03:35:33 GMT
Hi vadim

Vadim Gritsenko wrote:
> Edgar,
> Was trying to find more information following your references, but...
>> [1] http://thread.gmane.org/gmane.comp.apache.jackrabbit.devel/1435
> Points to JIRA which states [1]:
>    Comment by Edgar Poce [12/Jul/05 06:00 AM]
>    This kind of approach is discouraged by design
> Can you please clarify your point? 

There are a couple of conversations in the archive about this. My point 
is that the PM contract is not suitable for mapping the itemstates into 
a relational database with a table design that breaks the ItemState into 
its constituent parts. The PM is intended to keep it simple, which means 
to store the itemstate as a whole without interpreting the data. See the 
jdbc pm under contrib.

The main problem to store the itemstates in a complex schema is the 
Collection handling. Since Collection fields changes are not logged into 
add/update/remove aware objects, all the elements in the Collection must 
be stored on each write call. It causes a hit on performance when 
handling collections with lots of elements, even with the simple PMs 
included in the core.

see the second chart in http://issues.apache.org/jira/browse/JCR-188. In 
my PIV box with Object PM + cqfs, any write operation (e.g. set a 
property) takes up to half a sec when the given node reaches 3k children.
If I tried to run the same test with the impl proposed in jcr-91, the 
half sec mark would be reached much sooner than with 3k children, just a 
hundred children would make the repo unbearably slow.

when I decided to write the jdbc pm proposed in jcr-91 I wanted:

1 - a mature, transactional and scalable persistence storage
2 - use rdbms administrative tools, like scheduled backups, etc.
3 - rdbms referential integrity
4 - avoid redundancy. PMs store the NodeReferences twice.
5 - a storage that allows to modify the data easily, just in case.

But in order to achieve the above goals the PM should interpret the data 
:(. Maybe we can bring this up again after the first release ...

 > Or, may be point to the document /
 > discussion regarding the design?
Even when it's not directly related you might want to take a look to the 
Dominique's post about jackrabbit internals. See 

>> [2] http://wiki.apache.org/jackrabbit/PersistenceManagerFAQ
> Points to Wiki page which does not clarify your POV either. 
It's not my point of view. I just collected the devs opinions on this 
issue from the mailing list. If it's not clear please trace the 
conversations in the archive and clarify it.

 > It states though:
>    The PM interface was never intended as being a general SPI that
>    you could implement in order to integrate external datasources
>    with proprietary formats (e.g. a customers database).
> This raises the question, what is the recommended SPI to code against?
I think that the jcr-ext project under contrib might be a good starting 
point. Or, despite the PM is not intended to be a SPI, you can handle to 
plug your legacy data if you do it carefully.

> PS Wiki page has incorrect statement:
>     XML PersistenceManager
>       * Write operations are synchronized
> AFAICS, XML PM (unnecessarily) syncronizes all calls, including load() 
> and exist() calls. 
Why incorrect? maybe incomplete...

 > Does it mean FileSystem interface considered to be
> single threaded? 
I don't think so

 > Does not make much sense, though...
I agree. I think that the concurrency issue was handled first at the 
SHISM level, then it was moved to the PM, and then back to the SHISM 
(see http://issues.apache.org/jira/browse/JCR-164). Those synchronized 
modifiers seem to be there because the PM contract is not very clear 
yet, at least for me :(.


> Thanks,
> Vadim
> [1] http://issues.apache.org/jira/browse/JCR-91#action_12315534

View raw message