jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcel Reutegger <marcel.reuteg...@gmx.net>
Subject Re: Should Lucene index file size reduce when items are deleted?
Date Wed, 10 Jun 2009 07:11:31 GMT

2009/6/9 Shaun Barriball <sbarriba@yahoo.co.uk>:
> Hi Alex et al,
> Noted on the performance comment which prompts the question:
>  * what's the best way to monitor Lucene memory usage and performance to
> determine bad queries or bloated indexes - in a MySql world you could use
> Slow Query log?

there's a debug log message for
org.apache.jackrabbit.core.query.QueryImpl that includes the statement
and the time it took to execute it. if you direct that into a separate
log file and some tail/grep magic you should be able to get a log that
shows slow queries.

> And following up on the Lucene index size question.
> * Is there a way to force Jackrabbit to clean up the Lucene indexes - assume
> we're looking to consolidate disk space for example - rather than just
> waiting for the nodes to merge?

no, there's currently no such tool. however I consider this a useful

> For example:
> * Is there a way to ask JackRabbit to call
> http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/index/IndexWriter.
> html#optimize()?

no, there isn't.

> * If we delete the "index" directory will JackRabbit happily reconstruct a
> consolidated index from scratch?

yes, it will. that's currently the only way to get an index with all
segments optimized.

> Some of the content in our JackRabbit repository is high volume and fairly
> transient lasting only a few weeks before being deleted hence the index
> question is more relevant for us.

In general short living content is very well purged (not just marked
as deleted) from the index because the merge policy is generational.
the longer an item lives the harder it gets to purge it from the
index. it's somewhat similar to garbage collection in java. once an
object is in perm space it is more expensive to collect it.

I currently see two options how jackrabbit could better handle your case.

- introduce a method that lets you trigger an index optimization (as
you suggested)
- introduce a threshold for deleted nodes to live nodes ratio where an
index segment is automatically optimized

at the moment I prefer the latter because it does not require manual
interaction. WDYT?


> Regards,
> Shaun
> -----Original Message-----
> From: Alexander Klimetschek [mailto:aklimets@day.com]
> Sent: 08 June 2009 13:12
> To: users@jackrabbit.apache.org
> Subject: Re: Should Lucene index file size reduce when items are deleted?
> On Mon, Jun 8, 2009 at 1:41 PM, Shaun Barriball<sbarriba@yahoo.co.uk> wrote:
>> Thanks Marcel.
>> From a performance and memory usage perspective, should we see the
> benefits
>> of the deletion immediately or is the Lucene performance linked to the
> index
>> file sizes (and therefore reliant on the merge happening)?
> Indexing structures such as the Lucene fulltext index tend to use more
> disk space to drastically enhance access (query) performance.
> space performance != processing time performance
> Regards,
> Alex
> --
> Alexander Klimetschek
> alexander.klimetschek@day.com

View raw message