jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Neale" <michael.ne...@gmail.com>
Subject Re: Query Performance and Optimization
Date Mon, 05 Mar 2007 07:11:31 GMT
I would like to second this.

I know from previous discussions that it is a design decision of Jackrabbit
to not exlcusively work with RDBMS - if it was, I would be all in favour of
leaning on it to do the hardwork.

But I presume Lucene is leaned on to do all the hard work instead (and it is
certainly capable) - but for me query performance seems to be a bit of
voodoo and random without diving into jackrabbit. I definately think a lot
of work can be done in that regards.

On 3/1/07, David Johnson <dbjohnson.e@gmail.com> wrote:
> We are exploring using Jackrabbit in a production environment.  I have a
> repository that we have created from our content that has > 100K nodes.
> Several of our use case need to use date range queries and also use 'order
> by' frequently.  We have noticed that the query time is significantly
> slower
> than necessary.  After warming up the repository ( i.e., running the suite
> of queries once), as an example:
> "select * from Column where jcr:path like 'Gossip/ColumnName/Columns/%'
> and
> status <> 'hidden' order by publishDate desc" takes 500 ms to execute -
> this
> is just the execution time, I am not actually using or accessing the
> NodeIterator.
> Whereas: "select * from Column where jcr:path like
> 'Gossip/ColumnName/Columns/%' and status <> 'hidden'" takes only 33 ms to
> execute.
> /jcr:root/Gossip/ColumnName/Columns//element(*,Column)[@publishDate >
> xs:dateTime("way in the past") and @publishDate < xs:dateTime("way in the
> future") and (@status != 'hidden')] order by @publishDate descending takes
> 1096 ms to execute.
> Clearly dates (ordering and ranges) have a significant impact on query
> execution speed.
> Digging into the internals of Jackrabbit, we have noticed that there is an
> implementation of RangeQuery that essentially walks the results if the #
> of
> query terms is greater than what Lucene can handle.  Reading the Lucene
> documentation, it looks like Filters are the recommended method of
> implementing "large" range queries, and also seem like a natural for
> matching node types - i.e., select * from Column
> Is there any ongoing work on query optimization and performance.  We would
> be very interested in such work, including offering any help that we can.
> -Dave

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message