jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Johnson" <dbjohnso...@gmail.com>
Subject Re: Query Performance and Optimization
Date Thu, 01 Mar 2007 17:50:02 GMT
Any pointers and thoughts from the developers who have worked on the
LuceneQueryBuilder would be very appreciated.  As an idea, I was thinking of
running the Query AST through an optimization before it is passed the the
query builder.  Perhaps in
org.apache.jackrabbit.core.query.lucene.QueryImpl.execute() right before the
LueceneQueryBuilder.createQuery call.

Has anyone done any profiling on queries?  I have some data that I have
gathered with the Netbeans profiler that I could share if anyone is
interested.  Some highlights:

org.apache.lucene.search.Searcher.search(...) and children are taking 96%
time
of the children the first "hit" into jackrabbit code is at
org.apache.jackrabbit.core.query.lucene.SharedFiledSortComparator.newComparator(...)
with 58% time
with its child -
org.apache.jackrabbit.core.query.lucene.SharedFieldCache.getStringIndex(...)
taking all of its time.

At that point the biggest child is
org.apache.lucene.index.MultiTermDocs.next() taking the majority of the time
from then on out.

Any pointers/thoughts on either writing an optimizer for Lucene, alternate
indexing engines or even how to optimize queries would be appreciated.

-Dave

On 3/1/07, Christoph Kiehl <kiehl@subshell.com> wrote:
>
> David Johnson wrote:
>
> > Digging into the internals of Jackrabbit, we have noticed that there is
> an
> > implementation of RangeQuery that essentially walks the results if the #
> of
> > query terms is greater than what Lucene can handle.  Reading the Lucene
> > documentation, it looks like Filters are the recommended method of
> > implementing "large" range queries, and also seem like a natural for
> > matching node types - i.e., select * from Column
>
> As we are expecting to reach a count of 1.000.000+ nodes in one of our
> repositories I'm always interested in any performance improvements. Is
> anyone
> investigating in this proposal? Or could at least anyone tell me if it's
> worth
> investigating? ;)
>
> Cheers,
> Christoph
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message