well, the problem with that approach is the following:
assume you have a tree of nodes under /a, let's say 10 million nodes. then a
user renames /a to /b. the index would have to re-index 10 million nodes. this
operation is currently very efficient and takes just a couple of milliseconds,
because the nodes in the index are just linked with a parent uuid. renaming a
node simply means an update of one node (document) in the index.
but I agree with both of you that there is a lot of potential in optimizing
path/hierarchy resolution in the lucene query handler in jackrabbit. some
optimization is already done by caching the child->parent link information. e.g.
see:
http://svn.apache.org/repos/asf/jackrabbit/tags/1.2.3/jackrabbit-core/src/main/java/org/apache/jackrabbit/core/query/lucene/CachingIndexReader.java
(-> the field called 'parents')
That's in the end what the ChildAxisQuery and DescendantSelfAxisQuery use.
regards
marcel
Michael Neale wrote:
> Yeah I would +1 to that, its something I do fairly often (there is often a
> lot of info in a path that is relevant to a query - given that we have gone
> ahead and nicely partitioned our content !).
>
> On 3/13/07, David Johnson <dbjohnson.e@gmail.com> wrote:
>>
>> As another example, for each node, perhaps every potential parent path
>> could
>> be added to the index - as an example a node at /a/b/c/d/e/f/g would have
>> index entries:
>>
>> path1: /a
>> path2: /a/b
>> path3: /a/b/c
>> path4: /a/b/c/d
>> path5: /a/b/c/d/e
>> path6: /a/b/c/d/e/f
>>
>> so queries for specific sub-paths - e.g., select * from my:type where
>> jcr:path like '/a/b/c/%' could be mapped to a direct lucene match query
>> i.e.,
>> path3 = /a/b/c
>>
>> The index entry to use for the Lucene query could be determined easily by
>> simple parsing of the path specified in the query.
>>
>> Perhaps something like this is already in the code. Is ChildAxisQuery
>> and
>> DescendantSelfAxisQuery currently used for cases like this?
>>
>> -Dave
|