jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Johnson" <dbjohnso...@gmail.com>
Subject Re: Query Performance and Optimization
Date Mon, 12 Mar 2007 22:53:46 GMT
As another example, for each node, perhaps every potential parent path could
be added to the index - as an example a node at /a/b/c/d/e/f/g would have
index entries:

path1: /a
path2: /a/b
path3: /a/b/c
path4: /a/b/c/d
path5: /a/b/c/d/e
path6: /a/b/c/d/e/f

so queries for specific sub-paths - e.g., select * from my:type where
jcr:path like '/a/b/c/%'  could be mapped to a direct lucene match query i.e.,
path3 = /a/b/c

The index entry to use for the Lucene query could be determined easily by
simple parsing of the path specified in the query.

Perhaps something like this is already in the code.  Is ChildAxisQuery and
DescendantSelfAxisQuery currently used for cases like this?


On 3/12/07, Marcel Reutegger <marcel.reutegger@gmx.net> wrote:
> David Johnson wrote:
> > I think I was again focusing on range queries and giving Lucene some way
> of
> > filtering out subsets of the document set, so that the whole document
> set
> > wouldn't have to be walked.  For the date range query the from and to
> dates
> > would most likely share some set of most significant bytes - these bytes
> > could just be passed to Lucene as a direct match thereby reducing the
> > subset
> > of the collection that would by walked.  If the range query is fixed
> this
> > "optimization" would be unnecessary.  Nevertheless, I still wonder if
> there
> > is additional information that could be stored in Lucene to augment the
> > index and improve query processing.
> ah, now I see. yes, that might help in some cases. e.g. you could say get
> me all
> documents with a year value of 2007 and month value of 7. which would be
> equivalent to a range query 2007-07-01 to 2007-07-31
> > In this case I was considering using the node UUID as the cross-index
> join
> > parameter.  Still, there is the problem of combining the results from
> two
> > different indexes.
> there are two issues with this approach:
> 1) getting the UUID requires lucene to load the document
> 2) implementing an *efficient* join across system boundaries is not easy,
> even
> if the documents are sorted.
> >> 3) Use the database to provide the indexing structures.
> >>
> >> To me this seems to be a very interesting option, though it requires
> >> considerable effort.
> >
> > Yes, I agree, this is an interesting option, and does seem that it would
> > take a fair amount of effort.  Your comments on the user list to this
> same
> > thread seems like a start to the thought process needed.  I am not very
> > familiar with the details of the PM, although I do think that bringing
> > together data storage and indexing will help with improving query
> > processing
> > speed, as well as help with some data integrity issues that have been
> > discussed in other threads.
> >
> > Over the weekend, I will see if I can come up with a solution to the
> range
> > query issue discussed above.
> great.
> regards
>   marcel

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message