jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Abley <james.ab...@gmail.com>
Subject Re: Should Lucene index file size reduce when items are deleted?
Date Wed, 10 Jun 2009 20:29:36 GMT
2009/6/10 Marcel Reutegger <marcel.reutegger@gmx.net>:
> Hi,
> 2009/6/9 Shaun Barriball <sbarriba@yahoo.co.uk>:
>> Hi Alex et al,
>> Noted on the performance comment which prompts the question:
>>  * what's the best way to monitor Lucene memory usage and performance to
>> determine bad queries or bloated indexes - in a MySql world you could use
>> Slow Query log?
> there's a debug log message for
> org.apache.jackrabbit.core.query.QueryImpl that includes the statement
> and the time it took to execute it. if you direct that into a separate
> log file and some tail/grep magic you should be able to get a log that
> shows slow queries.
>> And following up on the Lucene index size question.
>> * Is there a way to force Jackrabbit to clean up the Lucene indexes - assume
>> we're looking to consolidate disk space for example - rather than just
>> waiting for the nodes to merge?
> no, there's currently no such tool. however I consider this a useful
> enhancement.
>> For example:
>> * Is there a way to ask JackRabbit to call
>> http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/index/IndexWriter.
>> html#optimize()?
> no, there isn't.
>> * If we delete the "index" directory will JackRabbit happily reconstruct a
>> consolidated index from scratch?
> yes, it will. that's currently the only way to get an index with all
> segments optimized.
>> Some of the content in our JackRabbit repository is high volume and fairly
>> transient lasting only a few weeks before being deleted hence the index
>> question is more relevant for us.
> In general short living content is very well purged (not just marked
> as deleted) from the index because the merge policy is generational.
> the longer an item lives the harder it gets to purge it from the
> index. it's somewhat similar to garbage collection in java. once an
> object is in perm space it is more expensive to collect it.
> I currently see two options how jackrabbit could better handle your case.
> - introduce a method that lets you trigger an index optimization (as
> you suggested)
> - introduce a threshold for deleted nodes to live nodes ratio where an
> index segment is automatically optimized
> at the moment I prefer the latter because it does not require manual
> interaction. WDYT?
> regards
>  marcel

Hi Marcel,

I think runtime tuning is fine; I'd also be in favour of seeing it
exposed via JMX, so that monitoring could pick up 'thrashing' and
operations could potentially script corrective action. Jackrabbit
(1.4.x at least) does not seem to expose much via JMX, but now that
Java 5 is required, it seems reasonable to leverage JMX more?



View raw message