jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcel Reutegger <marcel.reuteg...@gmx.net>
Subject Re: Jackrabbit query performance issues
Date Mon, 12 Feb 2007 12:07:38 GMT
Hi Savas,

Savas Triantafillou wrote:
> 1.  As you may see in the following queries, I would like to load all nodes
> of a certain type  using several forms
> 
>     The first one provides no information about the root path of the nodes,
> nor any information about their name
> 
>             DEBUG - QueryImpl.execute(149) | executed in 0,26 s.
> (//element(*, my:object))
> 
> 
>      The second one provides information about the node's name and is
> already slower than the first one, considering that it executed immediately
> after the first query
>      (i.e. cache seemed not to be working) and that it is slightly more
> specific than the first one
> 
>               DEBUG - QueryImpl.execute(149) | executed in 0,36 s.
> (//element(objectName, my:object))

This runs much faster on my jackrabbit instance.

I'm using 2000 test nodes of type nt:unstructured, each returning 21 nodes.

QueryImpl: executed in 0.14 s. (//element(node0, nt:unstructured))
QueryImpl: executed in 0.00 s. (//element(node1, nt:unstructured))
QueryImpl: executed in 0.02 s. (//element(node2, nt:unstructured))
QueryImpl: executed in 0.00 s. (//element(node3, nt:unstructured))
QueryImpl: executed in 0.02 s. (//element(node4, nt:unstructured))
QueryImpl: executed in 0.00 s. (//element(node5, nt:unstructured))
QueryImpl: executed in 0.02 s. (//element(node6, nt:unstructured))
QueryImpl: executed in 0.00 s. (//element(node7, nt:unstructured))
QueryImpl: executed in 0.02 s. (//element(node8, nt:unstructured))
QueryImpl: executed in 0.00 s. (//element(node9, nt:unstructured))

The first query is considerably slower because the path cache in the query 
handler needs to be filled.

>      The third query is similar to first one except the presence of the
> ordering. Is the difference in time justified only by the presence of the
> ordering ?
> 
>                DEBUG - QueryImpl.execute(149) | executed in 1,03 s.
> (//element(*, my:object) order by @modified descending)
> 
>       The fourth query  is similar to the second one with the addition of
> the orerding. Taking into account query execution times so far
>       this time seems the most rational
> 
>                DEBUG - QueryImpl.execute(149) | executed in 0,58 s.
> (//element(objectName, my:object) order by @modified descending)
> 
>        The fifth query is more specific concerning the path of the nodes.
> It seems that cache seems to be working now
> 
>                DEBUG - QueryImpl.execute(149) | executed in 0,12 s.
> (/jcr:root/my:system/my:objectRoot//element(*, my:object))
> 
>         The sixth query is even more specific, yet it is slower than the
> above one!!!!

that's probably because it involves an additional AND operation. nodes with a 
certain name intersected with nodes of a certain type. whereas the latter only 
searches for nodes with a certain type.

>                 DEBUG - QueryImpl.execute(149) | executed in 0,25 s.
> (/jcr:root/my:system/my:objectRoot//element(objectName, my:object))
> 
>         The last two queries differ in the presence of the ordering
> 
>                   DEBUG - QueryImpl.execute(149) | executed in 0,62 s.
> (/jcr:root/my:system/my:objectRoot/objectNameTypeFolder//element(*,
> my:object) order by @modified descending)
>                  DEBUG - QueryImpl.execute(149) | executed in 0,14 s.
> (/jcr:root/my:system/my:objectRoot/objectNameTypeFolder//element(objectName, 
> 
> my:object) order by @modified descending)
> 
> 
> Now, in order to have a more complete view, I have changed the order of the
> queries in that more specific queries are executed first. Here are the
> results
> 
> DEBUG - QueryImpl.execute(149) | executed in 0,55 s.
> (/jcr:root/my:system/my:objectRoot//element(*, my:object))
> DEBUG - QueryImpl.execute(149) | executed in 0,44 s.
> (/jcr:root/my:system/my:objectRoot//element(objectName, my:object))
> DEBUG - QueryImpl.execute(149) | executed in 1,36 s.
> (/jcr:root/my:system/my:objectRoot/objectNameTypeFolder//element(*,
> my:object) order by @modified descending)
> DEBUG - QueryImpl.execute(149) | executed in 0,16 s.
> (/jcr:root/my:system/my:objectRoot/objectNameTypeFolder//element(objectName, 
> 
> my:object) order by @modified descending)
> DEBUG - QueryImpl.execute(149) | executed in 0,03 s. (//element(*,
> my:object))
> DEBUG - QueryImpl.execute(149) | executed in 0,30 s. (//element(objectName,
> my:object))
> DEBUG - QueryImpl.execute(149) | executed in 0,28 s. (//element(*,
> my:object) order by @modified descending)
> DEBUG - QueryImpl.execute(149) | executed in 0,11 s. (//element(objectName,
> my:object) order by @modified descending)
> 
> 
> My belief is that there is no specific rule for creating a query that will
> guarantee a satisfactory time, not even the most obvious one, i.e. the more
> specific the query is,
> the faster it becomes.

This is not always the case. e.g. more specific may also mean in some cases more 
complex to execute.

> 2.  For each one of the 340 nodes I have created 40 versions and then rerun
> the above queries. All times tripled which makes me think that a query of
> type
> 
>     //element(*, my:nodeType)  will make Jackrabbit search through its
> version nodes as well. If this is the case, why this is happening?

because the query also includes the jcr:system subtree. If you not interested in 
nodes from the version store you need to exclude jcr:system subtree. E.g. have 
your content under a designated node instead of directly under the root node. 
Then you can search just in your content:
/jcr:root/my:content//element(*, my:type)

OR

if you don't want versions in your query results at all you can also disable 
indexing of versions:

- Remove or comment the tag /Repository/SearchIndex in your repository.xml

This change requires that you re-index all workspaces.

> I would really appreciate your thoughts as we are using Jackrabbit as a
> backend to a portal and migration from 1.1.1 to 1.2.1 changed portal
> performance dramatically.

Can you please provide examples of queries that changed in performance between 
the two versions?

regards
  marcel

Mime
View raw message