cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grzegorz Kossakowski (JIRA)" <>
Subject [jira] Commented: (COCOON-2065) huge performance increase of LuceneIndexTransformer on large Lucene indexes
Date Sun, 08 Jul 2007 10:44:04 GMT


Grzegorz Kossakowski commented on COCOON-2065:

Thanks Dominique for posting a patch.

As you already offered a help with updating documentation, would you like to move the page
from wiki to our official documentation repository that is located at
It's preferable to have that info in official docs.

Documents from Daisy will be published at official, reworked site soon.

> huge performance increase of LuceneIndexTransformer on large Lucene indexes
> ---------------------------------------------------------------------------
>                 Key: COCOON-2065
>                 URL:
>             Project: Cocoon
>          Issue Type: Improvement
>          Components: Blocks: Lucene
>    Affects Versions: 2.1.6, 2.1.7, 2.1.8, 2.1.9, 2.1.10, 2.1.11-dev (Current SVN), 2.2-dev
(Current SVN)
>            Reporter: Dominique De Munck
>            Priority: Minor
>             Fix For: 2.1.11-dev (Current SVN), 2.2-dev (Current SVN)
>         Attachments: LuceneIndexTransformer.patch
> The LuceneIndexTransformer optimizes the Lucene index every time you add an entry to
the index.
> This slows down enormously the indexing with a large index ! If upon every checkin of
a document eg,
> you use it to update the entry, it will slow down.
> Eg. I have a Pentium IV 2.4 Ghz, Lucene index contains 10 000 doc.
> Where the index update only takes say 60ms, the optimize that get's called, can take
7 seconds!
> I've created a patch that introduces an option "optimize-frequency" to determine the
frequency of the optimize call.
> It defaults to 1 (current behaviour), when a user sets it to 50, only once every 50 updates
the index will be optimized etc....
> If no optimization is wanted, you can set it to 0.
> This is compliant to the Lucene documentation (fragment of Lucene FAQ):
> "The IndexWriter class supports an optimize() method that compacts the index database
and speedup queries. You may want to use this method after performing a complete indexing
of your document set or after incremental updates of the index. If your incremental update
adds documents frequently, you want to perform the optimization only once in a while to avoid
the extra overhead of the optimization."
> added configuration option + a function  "needToOptimize()" which is called before optimizing.
> needToOptimize() uses a random function generator, to keep code simple.
> - when the option is not set, CODE WILL BE EXECUTED AS BEFORE
> - tested one 2.1.11 SVN branch, but no differences in the "main" trunk thus can be applied
there also.
> - Updated API docs
> - if patch accepted, I will also update the Wiki:

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message