jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Johnson" <dbjohnso...@gmail.com>
Subject Query Performance and Optimization
Date Wed, 28 Feb 2007 05:49:59 GMT
We are exploring using Jackrabbit in a production environment.  I have a
repository that we have created from our content that has > 100K nodes.
Several of our use case need to use date range queries and also use 'order
by' frequently.  We have noticed that the query time is significantly slower
than necessary.  After warming up the repository (i.e., running the suite of
queries once), as an example:

"select * from Column where jcr:path like 'Gossip/ColumnName/Columns/%' and
status <> 'hidden' order by publishDate desc" takes 500 ms to execute - this
is just the execution time, I am not actually using or accessing the
NodeIterator.

Whereas: "select * from Column where jcr:path like
'Gossip/ColumnName/Columns/%' and status <> 'hidden'" takes only 33 ms to
execute.

/jcr:root/Gossip/ColumnName/Columns//element(*,Column)[@publishDate >
xs:dateTime("way in the past") and @publishDate < xs:dateTime("way in the
future") and (@status != 'hidden')] order by @publishDate descending takes
1096 ms to execute.

Clearly dates (ordering and ranges) have a significant impact on query
execution speed.

Digging into the internals of Jackrabbit, we have noticed that there is an
implementation of RangeQuery that essentially walks the results if the # of
query terms is greater than what Lucene can handle.  Reading the Lucene
documentation, it looks like Filters are the recommended method of
implementing "large" range queries, and also seem like a natural for
matching node types - i.e., select * from Column

Is there any ongoing work on query optimization and performance.  We would
be very interested in such work, including offering any help that we can.

-Dave

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message