jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Hartmann <andr...@apache.org>
Subject Re: JCR & thesis
Date Mon, 31 Mar 2008 21:49:28 GMT
Jukka Zitting schrieb:

[...]

>>  - SQL query speed comparison with MySQL/PostgreSQL
>>  - read/write comparisons with filesystems
> 
> I'm sure that Jackrabbit will lose on both of those comparisons. The
> main benefit in using a JCR content repository comes not from
> duplicating content structures found in existing storage models, but
> in going beyond their current limitations.

I'm not sure if this is covered by the spec, but it is possible to query 
a pre-selected set of nodes (e.g., a subtree or the direct children of a 
node)? In this case I could imagine that Jackrabbit might be a lot 
faster than an RDBMS.

Regarding the file system - doesn't that depend on the cache settings? I 
could imagine that Jackrabbit offers more - or at least easier 
accessible - cache configurations, based on node types etc. But I have 
to admit I haven't looked at the feature list for quite a long time, so 
I'd better catch up before asking more silly questions :)

> For example any non-trivial RDBMS application requires a number of
> joins that can easily become quite expensive. Standard JCR doesn't
> event support joins as a query concept, but the tree hierarchy gives
> 1-n relationships and thus many 1-n joins essentially for free. Thus
> I'd not compare the raw query performance between a relational
> database and a content repository, but rather the higher level
> performance for selected used cases based on a content model that's
> designed to best leverage the capabilities of the underlying system.

That sounds very reasonable. As a CMS developer, I'd be very interested 
in usecases like these:

Find all documents with type="image" and the keyword list (multi-value 
property) contains "Spring" and "Flower" and the width is between 500 
and 600px. That's a typical query in the asset management.

Find all documents containing the XPath
//a[local-name() = 'xhtml' and namespace-uri = 'http://...' and
starts-with(@href,'lenya-document:c2c38f30-ff68-11dc-9682-9dea3e2477d4)]
That would be typical to find links that would be broken after a 
document is removed from the live site. I know that JCR doesn't support 
this directly - I guess this is where XML DBs shine. With JCR, is it 
necessary to traverse all documents and query the content using XPath, 
or is there a better solution?

> The same goes for JCR versus the file system. Most non-trivial
> applications that use the file system for storage end up using XML
> files, or other parsed resources for handling fine-grained content
> like individual dates, strings, numbers, etc. A content repository
> natively supports such fine-grained content, so many read and update
> operations that target such "small data" are much more convenient and
> ofter faster than in a file-based solution that requires explicit
> parsing and serializing (not to mention locking) of larger chunks of
> data.

In Lenya, we use XML files + Lucene for content and meta data indexing. 
Finding broken links (see above) is rather slow. Meta data queries are 
quite fast. I'd be very interested how this would change with Jackrabbit.

Martin, if you'd like to consider including the Lenya repository in your 
comparison, I'd try to assist if I find the time.


Jukka, thanks a lot for your valuable comments,

-- Andreas



-- 
Andreas Hartmann, CTO
BeCompany GmbH
http://www.becompany.ch
Tel.: +41 (0) 43 818 57 01


Mime
View raw message