jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Neale" <michael.ne...@gmail.com>
Subject Re: Query Performance and Optimization
Date Tue, 13 Mar 2007 00:27:28 GMT
Yeah I would +1 to that, its something I do fairly often (there is often a
lot of info in a path that is relevant to a query - given that we have gone
ahead and nicely partitioned our content !).

On 3/13/07, David Johnson <dbjohnson.e@gmail.com> wrote:
>
> As another example, for each node, perhaps every potential parent path
> could
> be added to the index - as an example a node at /a/b/c/d/e/f/g would have
> index entries:
>
> path1: /a
> path2: /a/b
> path3: /a/b/c
> path4: /a/b/c/d
> path5: /a/b/c/d/e
> path6: /a/b/c/d/e/f
>
> so queries for specific sub-paths - e.g., select * from my:type where
> jcr:path like '/a/b/c/%'  could be mapped to a direct lucene match query
> i.e.,
> path3 = /a/b/c
>
> The index entry to use for the Lucene query could be determined easily by
> simple parsing of the path specified in the query.
>
> Perhaps something like this is already in the code.  Is ChildAxisQuery and
> DescendantSelfAxisQuery currently used for cases like this?
>
> -Dave
>
> On 3/12/07, Marcel Reutegger <marcel.reutegger@gmx.net> wrote:
> >
> > David Johnson wrote:
> > > I think I was again focusing on range queries and giving Lucene some
> way
> > of
> > > filtering out subsets of the document set, so that the whole document
> > set
> > > wouldn't have to be walked.  For the date range query the from and to
> > dates
> > > would most likely share some set of most significant bytes - these
> bytes
> > > could just be passed to Lucene as a direct match thereby reducing the
> > > subset
> > > of the collection that would by walked.  If the range query is fixed
> > this
> > > "optimization" would be unnecessary.  Nevertheless, I still wonder if
> > there
> > > is additional information that could be stored in Lucene to augment
> the
> > > index and improve query processing.
> >
> > ah, now I see. yes, that might help in some cases. e.g. you could say
> get
> > me all
> > documents with a year value of 2007 and month value of 7. which would be
> > equivalent to a range query 2007-07-01 to 2007-07-31
> >
> > > In this case I was considering using the node UUID as the cross-index
> > join
> > > parameter.  Still, there is the problem of combining the results from
> > two
> > > different indexes.
> >
> > there are two issues with this approach:
> > 1) getting the UUID requires lucene to load the document
> > 2) implementing an *efficient* join across system boundaries is not
> easy,
> > even
> > if the documents are sorted.
> >
> > >> 3) Use the database to provide the indexing structures.
> > >>
> > >> To me this seems to be a very interesting option, though it requires
> > >> considerable effort.
> > >
> > > Yes, I agree, this is an interesting option, and does seem that it
> would
> > > take a fair amount of effort.  Your comments on the user list to this
> > same
> > > thread seems like a start to the thought process needed.  I am not
> very
> > > familiar with the details of the PM, although I do think that bringing
> > > together data storage and indexing will help with improving query
> > > processing
> > > speed, as well as help with some data integrity issues that have been
> > > discussed in other threads.
> > >
> > > Over the weekend, I will see if I can come up with a solution to the
> > range
> > > query issue discussed above.
> >
> > great.
> >
> > regards
> >   marcel
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message