jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcel Reutegger <marcel.reuteg...@gmx.net>
Subject Re: Some performance questions about Jackrabbit
Date Wed, 06 Feb 2008 10:18:20 GMT
Lorenzo Dini wrote:
> Indexing and Searching
> 
> 12) How much is the improvement of specifying the indexing rules? I am 
> mainly use the name property for searching and few others... Setting 
> this properties as priorital would speedup a lot? I think that most of 
> the time is spent not on the lucine query itself but in loading and 
> sorting the nodes.

an indexing rule has an effect on the size of the index. if fewer properties are 
indexed the, the index will be smaller and queries will be slightly faster. the 
primary use of the rules however are boost values that you can assign. those 
have an effect on the ordering of result nodes in case you do an 'order by 
@jcr:score'. boost values in the configuration do not have an effect on performance.

performance wrt sorting of nodes has been greatly improved in 1.4 and should now 
be faster than in 1.3.x.

> 13) When exactly the nodes are loaded from the DB by the QueryEngine?

this depends on the query, the configuration and the sort criteria. if the 
configuration is set to respectDocumentOrder=true (default, but will change to 
false in jackrabbit >= 1.5) and there is no sort criteria in the query 
statement, then all result nodes are loaded and they are sorted according to 
their document order.

> What's happening during query.execute()?
> What's during query.getNodes()? how many nodes are read from the DB?

none, except if respectDocumentOrder=true and there is no sort criteria

> When (and how) the sorting is done?

sorting is done at the very end of the query. document order is calculated from 
the content directly. any other sorting (based on property values) is done using 
lucene.

> What's during iterator.nextNode()

the uuid of the node is resolved into a Node instance. Usually the nodes needs 
to be read from the persistence manager, unless it is already present in the cache.

> 14) How the sorting works since it cannot be done by the DB? Is it done 
> by lucine?

correct.

> or simply all the nodes are sorted using a collections.sort? 
> That means that all nodes must be loaded before returning the first and 
> even if you need only the first N.

this is only the case for results in document order. we assumed people would 
rarely need this and did not optimize it.

> How to speedup this?

this depends on the query you have. can you please provide some query statements?

> 15) Is there any change in JR 1.4? I saw it is possible to limit the 
> entries returned and the offset, how this work with sorting?

actually lots of. performance has been improved for property existence checks, 
hierarchy checks are faster and sorting has been improved as well.

> 16) In case I need a specific subnode with a particular property, is it 
> faster to list all the subnodes using the node.getNodes() and picking 
> the right one or doing a lucine query? I imagine it depends on the 
> number of subnodes but aproximately for 20 subnodes the overhead of 
> lucine overperform the getNodes()

if there are only 20 child nodes the manual check is probably faster than a query.

regards
  marcel

Mime
View raw message