jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcel Reutegger <marcel.reuteg...@gmx.net>
Subject Re: Query Performance and Optimization
Date Fri, 16 Mar 2007 15:43:17 GMT
Christoph Kiehl wrote:
> As I understand in DescendantSelfAxisQuery.DescendantSelfAxisScorer the 
> contextHits are used to filter the subHits result to only include nodes 
> of the given context. The context is something like /foo/bar//*, which 
> means all descendents of /foo/bar. Is that right?

yes, that's correct.

> In our application the context for most of our queries is the same, so 
> it would make a lot of sense to cache the contextHits for this context. 
> There is already a todo in the constructor of DescendantSelfAxisScorer 
> which probably aims at this.

no, not exactly. because the size of the BitSets used are equal to the overall 
size of the index they may become quite large. that's why I once thought it may 
be useful to reuse BitSet instances, which is not the same as caching the result 
of a query.

> I would go even further and not only cache these contextHits, but cache 
> contextHits per _node_ in a hierarchy, which means there is a BitSet for 
> /foo/bar/bla[1], /foo/bar/bla[2] and so on. If I need the BitSet for 
> /foo/bar//* I could just join the BitSets of the descendents. This would 
> allow reuse the BitSets for different contexts. What do you think about 
> this? It should improve performance a lot the larger the resultset is an 
> the less specific your context is.

hmm, I'm not sure how you would implement that. joining the BitSets you 
mentioned may as well be expensive if you reach a certain amount of them.

furthermore a BitSet for /foo/bar//* is very unstable in a sense that it will 
change very frequently. with every change under /foo/bar a node gets a new 
document number and we would have to create a new BitSet. I guess we would need 
to find a way to efficiently modify an existing BitSet when:

- the index is updated (because of a change)
- index segments are merged (caused by a background thread)

> Wouldn't it make sense to rewrite all @foo:bar!='john' queries to 
> not(@foo:bar!='john') by default instead of using creating a MatchAllQuery?

do you mean rewrite: @foo:bar!='john' to not(@foo:bar='john') ?


View raw message