jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcel Reutegger <marcel.reuteg...@gmx.net>
Subject Re: Search performance : MultiIndex
Date Tue, 30 Oct 2007 09:56:03 GMT
Ard Schrijvers wrote:
>> Christoph Kiehl wrote:
>> I would really like to find a solution to those problems. 
>> Maybe we should use some additional kind of index for 
>> resolving parent-child relations. Do you have any ideas yet 
>> how improve performance in those areas?
> AFAICS, when we want to solve it within lucene with querying, we will
> have a trade-off between "fast searching" and fast "moving of nodes"
> (I'll get back on this one) 

I'd like to keep a move operation cheap but I also agree that hierarchical query 
operations need to be improved.

> /documents/en/news//*[@modificationDate] order by @modificationDate
> Typically, a news folder contains tens of thousands of items, and this
> query is not possible with the current JackRabbit impl (at least, my
> experience is that for > 10.000 docs this query takes multiple seconds,
> while I need the result in  < 50ms (50 is really the max IMO) ).

we should definitively try to execute such queries in a reasonable time. I think 
this is a very common use case. can you please create a jira issue? I'm not sure 
if the hierarchy is the issue here or just the fact that lots of nodes need to 
be ordered. do you have more insight on this?

> Obviously, this only works when I index a node's path in some lucene
> field. So a node with path /documents/en/news/2007/10/14/item.xml
> would have lucene Field that contains the terms
> '/documents/en/news/2007/10/14/item.xml'
> '/documents/en/news/2007/10/14'
> '/documents/en/news/2007/10'
> '/documents/en/news/2007'
> '/documents/en/news'
> '/documents/en'
> '/documents'

maybe this approach can be turned into a clever hierarchical cache? without the 
need to index the whole path with a node.


View raw message