jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcel Reutegger <marcel.reuteg...@gmx.net>
Subject Re: Query Performance and Optimization
Date Mon, 12 Mar 2007 15:31:21 GMT
David Johnson wrote:
> I think I was again focusing on range queries and giving Lucene some way of
> filtering out subsets of the document set, so that the whole document set
> wouldn't have to be walked.  For the date range query the from and to dates
> would most likely share some set of most significant bytes - these bytes
> could just be passed to Lucene as a direct match thereby reducing the 
> subset
> of the collection that would by walked.  If the range query is fixed this
> "optimization" would be unnecessary.  Nevertheless, I still wonder if there
> is additional information that could be stored in Lucene to augment the
> index and improve query processing.

ah, now I see. yes, that might help in some cases. e.g. you could say get me all 
documents with a year value of 2007 and month value of 7. which would be 
equivalent to a range query 2007-07-01 to 2007-07-31

> In this case I was considering using the node UUID as the cross-index join
> parameter.  Still, there is the problem of combining the results from two
> different indexes.

there are two issues with this approach:
1) getting the UUID requires lucene to load the document
2) implementing an *efficient* join across system boundaries is not easy, even 
if the documents are sorted.

>> 3) Use the database to provide the indexing structures.
>> To me this seems to be a very interesting option, though it requires
>> considerable effort.
> Yes, I agree, this is an interesting option, and does seem that it would
> take a fair amount of effort.  Your comments on the user list to this same
> thread seems like a start to the thought process needed.  I am not very
> familiar with the details of the PM, although I do think that bringing
> together data storage and indexing will help with improving query 
> processing
> speed, as well as help with some data integrity issues that have been
> discussed in other threads.
> Over the weekend, I will see if I can come up with a solution to the range
> query issue discussed above.



View raw message