jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Kiehl ...@sulu3000.de>
Subject Re: DescendantSelfAxisWeight ChildAxisQuery performance
Date Fri, 30 Nov 2007 11:59:17 GMT
Ard Schrijvers wrote:

>>> Since my "stuff//*[@count]" gives me 1.200.000, it makes 
>> perfect sense 
>>> to users I think, that even with our patches and a working 
>> cache, that 
>>> retaining them all would be slow. But if I set the limit to 
>> 1 or 10, I 
>>> would expect to have performance (certainly when you have not 
>>> implemented any AccessManager).
>>> But, if I set limit to 1, why would we have to check all 1.200.000 
>>> parents wether the path is correct?
>> I'm not quite sure if this is a valid/common use case. I 
>> can't imagine doing a query like this without using an "order 
>> by" clause. Because without an "order by" you will just get a 
>> random node. But if you use an "order by" you need to get all 
>> nodes first anyway.
> This is not my point. Wether you have an order by or not, lucene will
> compute the score of all hits anyway. So, no order by, does not mean
> that lucene does not order: it orders on score (but ofcourse you already
> know that :-) )
> So, my thing holds with and without order by. 

Ok, if you use jcr:contains() it makes certainly sense to use lucenes 
default ordering by score. As soon as you are ordering by specific 
properties like modification date you won't win anything. I just wanted 
to express that this solution only works for a limited number of use cases.

>> 1) The total result size will be very inaccurate until you 
>> fetched the whole result set. Even now it might be inaccurate 
>> because of AccessManager checks but doing lazy parent-child 
>> relation check will make it almost unusable.
> You might warn that fetching a total result size is slow. Without having
> to know the total, it should not have to be slow.

Ok. That might be unexpected behaviour but is a valid solution.

>> 2) DescendantSelfAxisQueries and ChildAxisQueries are not 
>> only used as a final selector but can also be used inside a 
>> query like this:
>> 	stuff//*[@bar='text' and @foo/count]
>> You probably can't calculate @foo/count lazyily.
> @foo/count should probably be foo/@count isn't? I haven't yet used
> DescendantSelfAxisQueries and ChildAxisQueries in these kind of queries,
> but I see your point

Yes, I meant foo/@count of course. ;)

>> I know what you are talking about. That's why I don't use any 
>> hierarchical queries at all. My queries all look like:
>> 	//element(*, nt:specific-node-type)[@count]
>> So I'm distinguishing my nodes only by node type or sometimes 
>> mixins instead of by paths.
> I already understood that (aware) people are using it like this (but
> what about the unaware people). But, suppose I have articles in
> different languages, with different initials paths, and I want the 10
> lastmodified from some language. It doesn't make sense that I need to
> make articles for every language a different nodetype, because of
> DescendantSelfAxisQueries and ChildAxisQueries.

My current solution is definitely not the way to go for the future!

> I also have the idea that it will at least be extremely hard, *but*, I
> also wanted to emphasize that if we just look at the problem from a
> birds eye view, we must agree that checking all parent paths doesn't
> really make sense in some cases (certainly when the number of hits is
> very large)

Agreed ;)

> Anyway, perhaps we just have to think a little harder. Not everything
> has to be simple :-) 

I'll try to think a little harder ;)


View raw message