jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Kiehl <christ...@sulu3000.de>
Subject Re: improving the scalability in searching
Date Tue, 21 Aug 2007 22:27:31 GMT
Ard Schrijvers wrote:
>> Christoph Kiehl wrote: 
>> 4. Regarding sorting: We will still need our own sorting because we cache
>> the document order per subreader whereas lucenes sorting only caches per
>> reader which get invalidated after every write operation. But the initial
>> cache creation will be faster.
> That is a good point! I think in the sorting cache not the field prefix of
> the terms where used, were they? If so, instead of performance gain, we might
> gain quite some memory efficiency (though I am guessing here a little :-) )

Unfortunately it doesn't even help regarding memory consumption because we only 
cache the terms itself without the prefixes.

> I think that beside all unit tests have to keep working, I might/should
> include a performance unit test, to see if there are substantial gains.

Well, it would be great to have such a performance test but in my experience the 
repository you use to run your test against has to be at least of a certain size 
to give a notable difference. It's difficult to create such a repository in a 
portable way. It's too big to check into subversion and too big to create on the 
fly. It would be great to have some kind of reference repository. I thought 
about taking maybe a wikipedia snapshot (which are available for download) and 
pump this data into the repository. This will result in quite a big repository ...

> I am not sure if there is an xpath equivalent to "give me all different
> values of a property"...probably not, right?

I'm afraid not.

>> I wouldn't mind if you just start working on it ;) I'm sure Marcel is happy
>> to answer your questions, as am I if I'm able to ;) You could open a second
>> issue for the 1:1 mapping. Then just use those two issues and attach
>> patches. I'll definitely review them and try to help.
> Ok. I'll file a jira issue on thursday for this, because tomorrow I am
> occupied all day.



View raw message