jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alessandro Bologna" <alessandro.bolo...@gmail.com>
Subject Re: Query performances
Date Fri, 30 Mar 2007 15:49:24 GMT
Marcel,

just wanted to get back to you (and the list as well). I downloaded
jackrabbit-webapp-1.3-SNAPSHOT and run the same tests again.
Performances are much better and queries seem to be much more optimized.
Congratulations for the improvements.
Alessandro



On 3/28/07, Marcel Reutegger <marcel.reutegger@gmx.net> wrote:
>
> Hi Alessandro,
>
> Alessandro Bologna wrote:
> > Now I have found another unusual behavior, and I was hoping you could
> > explain this too...
> > These queries have been executed in sequence (without restarting):
> >
> >
> > Executing query: /jcr:root/load/n10/n33/*[@random>10000]
> > Query execution time:10245ms
> > Number of nodes:91
> >
> >
> >
> > Executing query: /jcr:root/load/n10/n33/*[@random>10000 and
> > @random<10000000]
> > Query execution time:20409ms
> > Number of nodes:91
> >
> >
> >
> > Executing query: /jcr:root/load/n10/n33/*[@random>10000 and
> > @random<10000000 and @random<10000001]
> > Query execution time:30053ms
> > Number of nodes:91
> >
> >
> > I think that the execution time on the first query is already quite high
> > (an
> > equality query takes just a few millisecond),
>
> This has already been improved with
> http://issues.apache.org/jira/browse/JCR-804
>
> > but what I am more
> > disconcerted about is that the second query (with two condition, the
> second
> > being a 'dummy' one since it is true for each of the 91 nodes returned
> by
> > the second query) takes double the time, and the third query (with the
> > third
> > condition being basically the same as the first one) takes three times
> as
> > much.
> >
> > Typically I would expect an 'and' query to be executed on the results of
> > the
> > first one, and therefore to take just a little bit less.
> >
> > So the questions are:
> > 1. why does it takes so long to find 91 nodes in the first query
>
> this is caused by:
> - MultiTermDocs is expensive on large value ranges (-> fixed in JCR-804)
> - @random>10000 (probably) selects a great number of nodes, which are
> later
> excluded again because of the path constraint
>
> > 2. why the second and third query take as much time as the first times
> the
> > number of expressions?
>
> each of the expressions is evaluated independently and in a second step
> 'and'ed
> together. therefore the predominant cost in your query seems to be the
> individual expressions. because each of the range expressions selects a
> lot of
> nodes lucene cannot optimize the execution well. see above for a
> workaround.
>
> > 3. is there a workaround to do range queries?
>
> partitioning the random property into multiple properties may help. the
> basic
> idea is that you split the random number into a sum of multiple values.
>
> @random = 34045
>
> would become:
>
> @random1 = 5
> @random10 = 4
> @random100 = 0
> @random1000 = 4
> @random10000 = 3
>
> later if you search for all random properties with a value larger than
> 12000 you
> would have a query:
> //*[(@random10000 = 1 and @random1000 >= 2) or (random10000 >= 2)]
>
> because the distinct values of the split up properties are small, lucene
> can
> much better optimize the query execution.
>
> regards
> marcel
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message