jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alessandro Bologna" <alessandro.bolo...@gmail.com>
Subject Query performances
Date Thu, 15 Mar 2007 16:34:50 GMT

We have been incurring in an interesting behavior doing searches on a quite
large repository (~1,000,000 nodes).
The test data is made of a tree of nodes of type nt:unstructured, reference
able, with two numeric properties (a sequential count of the node and a
random number between 0 and the count). Each node has a reference to the
parent, and up to 100 child nodes, and is named n<m> where m is the index of
the node, related to the parent node.
So, for instance, /load/n0 is the first node, /load/n1 the second to
Then each one of them has 100 children and so on, so that a valid path, for
instance, is /load/n23/n34/n50.
One node out of 6 has attached a nt:file node as well, in order to test full
text searches. If requested, I can provide the code to create the test set.

The strange behavior that prompted me to write to this mailing list, is the

Say that I am searching for a node that contains the word 'beatles' at some
level under the node /load/n40 and I use the following query:
*/jcr:root/load/n40//*[jcr:contains(.,'beatles')]* the execution time is
If I use instead:
*/jcr:root/load/n40/*/*/*/*[jcr:contains(.,'beatles')]*  the execution time
is 19749ms

The second query, in theory, could execute faster than the first, because I
am providing more information (only nodes at the 4th level under /load/n40)
but takes 10 times longer to execute.
Is there a reason why?

The other, way more worrisome problem, appears to be the opposite:
I have executed the following two queries
/jcr:root/load/n50/n2/* ==> 931ms
/jcr:root/load/n50/n2/*/* ==> 661ms

The first is returning all nodes one level below /load/n50/n2 and the second
two levels below. There are no other nodes under that.
When I tried the following query, which would return the same nodes in one
operation, the result was surprising (in a bad way)
/jcr:root/load/n50/n2//* ==>*353769ms*
The CPU goes 100%, I see in the jackrabbit logs a lot of entries similar to:
DocNumberCache: size=1024/1024, #accesses=17039, #hits=167, #misses=16872,
cacheRatio=1% (DocNumberCache.java, line 155)

and then finally, *some 5 minutes later*, I get the result.
Even if I restrict the query, it still takes the same time:
/jcr:root/load/n50/n2/m96//* and there's maybe only an hundred nodes under

I have the exact same behavior if I try with the SQL syntax: select * from
nt:base where jcr:path like '/load/n50/n2/n96/%'

The version of JR is 1.2.2. The backend is Oracle 10g, and I am running the
application on Tomcat 5.5 with jdk 1.5 and 1GB assigned to the JVM (on

Does anybody have any idea on why is this happening and if there is a
Alessandro Bologna


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message