jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Klimetschek <aklim...@day.com>
Subject Re: Query vs Manual Node Iteration
Date Fri, 19 Mar 2010 10:52:48 GMT
On Fri, Mar 19, 2010 at 09:43, Gadbury <gadbury@googlemail.com> wrote:
> What would be the fastest method to retrieve all account nodes from under
> accounts of provider[2] ?  We can assume that for method 1 (the query) that
> my:account has the property providerUUID (STRING).
> 1) an XPath query such as:  //element(*,
> my:account)[@providerUUID='e525ad70-3331-11df-9aae-0800200c9a66']
> 2) Getting the (provider[2]) node by UUID, get the accounts node underneath,
> call accountsNode.getNodes() and iterate with the NodeIterator, checking
> each node is of nodeType my:account and adding the node to a list?

I would strongly recommend to get rid of the SNS. Then you can get rid
of using UUIDs for identifying nodes, but use paths, which is more
readable and more JCR-like.

And don't use providerUUID as property on a node for yet another
reference to one of its ancestors - the parent relationship already
gives you this! You have to see the parent-child relationship (or
relationships = the path) as the "primary key" when compared to RDBMS.
And one should make use of it as much as possible.

Following that advice, also only put nodes of type my:account under
the subfolder "Accounts", so you don't have to filter out the nodes
with the wrong node type, because there won't be none. This is
actually a good example how you can avoid node types by just having
the "right" content structure (using nt:unstructured for everything
then, for full future flexibility) (*).

You would then do 2), without having to filter:


(*) There are often cases where certain entities are scattered across
the repository (if you make use of free-form content structures) and
then search is an easy way to get them. Therefore having a node type
as a "marker" is useful. Personally I think nodetypes should not put
too much constraints on the content, since you will always have to
change it in the future as experience proves. Hence, always inherit
from nt:unstructured for those "marker" nodetypes.

> Obviously, the second way I do not need a weak reference to the provider in
> nodes of type my:account.

Yes, as I said, that's a bad idea, just waisting storage space.

> I would assume that the second method is quicker as we're only searching
> under a specific node.  The query would search the entire repository
> structure, including my:account nodes under other, irrelevant providers so
> should be faster...?  I have tried specifying a path constraint in my XPath
> queries but I read that is slower due to extra path checks being made.

Yes, the path is not indexed (in the current jackrabbit lucene search
implementation), only properties and/or node type checks. A leading
path in your xpath query leads to additional checks, accessing the
persistence layer (if not cached) for nodes that aren't going to be in
the result set, which is always slower than a pure index lookup.
However, this doesn't mean this is always slow, in most cases it will
be fast enough.

> I would be very interested to learn the advantages and disadvantages of both
> approaches.  I also have started to use SQL2 queries (mainly for the join
> functionality) but I believe they are slower than XPath.

SQL2 uses the same underlying lucene implementation as the xpath query
evaluator. It is quite new, so some things might not be perfect yet,
but in theory the performance should match. Xpath doesn't have joins,
so you can't compare them in this case.


Alexander Klimetschek

View raw message