jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting" <jukka.zitt...@gmail.com>
Subject Re: JCR & thesis
Date Mon, 31 Mar 2008 16:05:13 GMT

2008/3/31 Andreas Hartmann <andreas@apache.org>:
>  Martin schrieb:
> > I've talked to my supervisor and comparison to other technologies with
> > some benchmarks could be interesting.
>  I wonder if it is really appropriate, or even possible, to compare
>  "technologies" regarding performance. IMO the performance is rather an
>  aspect of the implementation.

I for one would be very interested in seeing cross-technology
comparisons for various common use cases. There are many cases for
which relational databases are better suited than content
repositories, but the opposite is also true.

See http://www.mit.edu/~dna/vldb07hstore.pdf for a very interesting
paper on the limits of traditional RDBM systems. One of the key points
related to JCR is the observation (see section 3.1) that many typical
applications use "tree schemas", i.e. a hierarchy of 1-n relationships
that map very well to the hierarchical model in Jackrabbit. Most
notably a hierarchical database can in many cases avoid expensive
JOINs for such schemas.

It would be really cool to see a thesis that evaluates JCR content
repositories in light of the above paper.

>  - SQL query speed comparison with MySQL/PostgreSQL
>  - read/write comparisons with filesystems

I'm sure that Jackrabbit will lose on both of those comparisons. The
main benefit in using a JCR content repository comes not from
duplicating content structures found in existing storage models, but
in going beyond their current limitations.

For example any non-trivial RDBMS application requires a number of
joins that can easily become quite expensive. Standard JCR doesn't
event support joins as a query concept, but the tree hierarchy gives
1-n relationships and thus many 1-n joins essentially for free. Thus
I'd not compare the raw query performance between a relational
database and a content repository, but rather the higher level
performance for selected used cases based on a content model that's
designed to best leverage the capabilities of the underlying system.

The same goes for JCR versus the file system. Most non-trivial
applications that use the file system for storage end up using XML
files, or other parsed resources for handling fine-grained content
like individual dates, strings, numbers, etc. A content repository
natively supports such fine-grained content, so many read and update
operations that target such "small data" are much more convenient and
ofter faster than in a file-based solution that requires explicit
parsing and serializing (not to mention locking) of larger chunks of


Jukka Zitting

View raw message