jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ard Schrijvers" <a.schrijv...@hippo.nl>
Subject RE: Re: improving the scalability in searching
Date Tue, 21 Aug 2007 21:17:09 GMT

> Christoph Kiehl wrote:
> In general I think it's a good idea to have a 1:1 mapping of 
> properties to 
> lucene fields. It's just more natural and easier to 
> understand as you said.
> Performance wise I'm not sure if it will gain you "lots of 
> performance". I just 
> had a quick look at the code and found the following places 
> where I think the 
> performance will improve:
> 1. DerefQuery can directly query for matching documents 
> instead of iterating 
> over all context hits.
> 2. MatchAllScorer would perform better. But you made an even 
> better suggestion 
> how to handle those in the future.
> 3. WildcardQuery will probably improve a bit because you have 
> less terms.
> 4. Regarding sorting: We will still need our own sorting 
> because we cache the 
> document order per subreader whereas lucenes sorting only 
> caches per reader 
> which get invalidated after every write operation. But the 
> initial cache 
> creation will be faster.

That is a good point! I think in the sorting cache not the field prefix of the terms where
used, were they? If so, instead of performance gain, we might gain quite some memory efficiency
(though I am guessing here a little :-) )

> Overall I wouldn't expect a _much_ better performance. Or 
> could you explain what 
> other performance improvements you expect?

I think most improvements (performance and memory consumption) are small. The big fish was
indeed the MatchAllScorer replacement by PROPERTIES_SET. ATM, I can not foresee wether there
are other parts that might become easier/faster. I think that beside all unit tests have to
keep working, I might/should include a performance unit test, to see if there are substantial
gains. My other plan about 'virtual node indexing' (not real nodes, only for searching) could
add substantial faster searches, but for now, this would imply non jsr custom JR code, which
Marcel already commented on to dislike :-(

An example of something that would gain performance with the 1:1 mapping, is one of the parts
that I am implementing in a custom class is that I want to query for all different terms in
field X and count them (facetted views) [code will be open source so if people are interested,
in due time I can give pointers to the code(or better, if there is room in the JR trunk for
it)]. I do not think this would be possible to implement in a performant way without the 1:1

I am not sure if there is an xpath equivalent to "give me all different values of a property"...probably
not, right?

> But I would definitely like to see the 1:1 mapping, because 
> some parts of the 
> code become better/easier to understand and even those small 
> performance 
> improvements are a gain.

Yes, I think so too. It took me hours to understand when I first opened the current jackrabbit
indices with luke :-)

> I wouldn't mind if you just start working on it ;) I'm sure 
> Marcel is happy to 
> answer your questions, as am I if I'm able to ;)
> You could open a second issue for the 1:1 mapping. Then just 
> use those two 
> issues and attach patches. I'll definitely review them and 
> try to help.

Ok. I'll file a jira issue on thursday for this, because tomorrow I am occupied all day.

> Thanks a lot for your efforts!

You're welcome

Regards Ard

> Cheers,
> Christoph

View raw message