jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Johnson" <dbjohnso...@gmail.com>
Subject Re: Query Performance and Optimization
Date Wed, 14 Mar 2007 19:58:23 GMT
Both of these proposals sound great - particularly the additional caching in
DescendantSelfAxisQuery.  I think this would address the scenario that I
suggested additional indexing earlier in this thread.  As I mentioned, in my
query test set DescendantSelfAxisQuery.DescendantSelfAxisScorer.next() is
taking the most time, so any speed-up there would be great.

-Dave


On 3/14/07, Christoph Kiehl <christoph@sulu3000.de> wrote:
>
> Marcel Reutegger wrote:
> > Christoph Kiehl wrote:
> >>> I've created a jira issue:
> http://issues.apache.org/jira/browse/JCR-791
> >>
> >> Are you working on this issue? Or should I try to implement something?
> >
> > I just started working on it ;)
>
> Great news ;)
>
> Now that you are working on implementing this cache on a per index reader
> basis,
> I got another suggestion for improvement ;)
>
> As I understand in DescendantSelfAxisQuery.DescendantSelfAxisScorer the
> contextHits are used to filter the subHits result to only include nodes of
> the
> given context. The context is something like /foo/bar//*, which means all
> descendents of /foo/bar. Is that right?
> In our application the context for most of our queries is the same, so it
> would
> make a lot of sense to cache the contextHits for this context. There is
> already
> a todo in the constructor of DescendantSelfAxisScorer which probably aims
> at this.
> I would go even further and not only cache these contextHits, but cache
> contextHits per _node_ in a hierarchy, which means there is a BitSet for
> /foo/bar/bla[1], /foo/bar/bla[2] and so on. If I need the BitSet for
> /foo/bar//*
> I could just join the BitSets of the descendents. This would allow reuse
> the
> BitSets for different contexts. What do you think about this? It should
> improve
> performance a lot the larger the resultset is an the less specific your
> context is.
>
> >> It seems like if I rewrite the following query from
> >>
> >> /foo/*[@foo:bar!='john' and @foo:bar!='doe']
> >>
> >> to
> >>
> >> /foo/*[not(@foo:bar='john' or @foo:bar='doe')]
> >>
> >> I get a better performance. Can you confirm this?
> >
> > Yes, I can. Basically because any != comparison is translated into: get
> > all nodes with the given property, then exclude the ones that match the
> > literal. Which is obviously much more expensive than just: get all nodes
> > that match a given literal.
>
> Wouldn't it make sense to rewrite all @foo:bar!='john' queries to
> not(@foo:bar!='john') by default instead of using creating a
> MatchAllQuery?
>
> Cheers,
> Christoph
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message