cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Quinn <jer...@media.demon.co.uk>
Subject Re: luceneINdexTransformer not optimized
Date Wed, 17 Nov 2004 09:34:22 GMT
Many thanks
I will review this as soon as I can.

regards Jeremy


On 16 Nov 2004, at 23:03, Nicolas Maisonneuve wrote:

> see http://issues.apache.org/bugzilla/show_bug.cgi?id=32263
>
>
> On Tue, 16 Nov 2004 11:04:39 +0000, Jeremy Quinn
> <jeremy@media.demon.co.uk> wrote:
>> Dear Nicolas
>>
>> If you were to provide a patch and send it to bugzilla (then notify me
>> of the bug #) I would be happy to review it.
>>
>> regards Jeremy
>>
>>
>>
>>
>> On 15 Nov 2004, at 23:34, Nicolas Maisonneuve wrote:
>>
>>> the method to update a document is not optimized (reindexDocument
>>> method). this actual behavior is :
>>>
>>> 1- open reader if not open (but in fact it's always closed because of
>>> line  3)
>>> 2-delete document
>>> 3-close reader
>>> 4-open writer
>>> 5- write index
>>> 6-close index
>>>
>>> (NOTE: with this behavior, the merge factor is useless because this
>>> method index only one document for a opening of indexwriter)
>>>
>>> - A optimization in lucene is to avoid to open and close  indexreader
>>> and indexwriter a lot of times.
>>>
>>> so i propose this simple optimization :
>>> 1- open reader if not open
>>> 2- delete document
>>> 3-store lucene document in a buffer (Stack)
>>>
>>> // flush the buffer
>>> if ((buffer % max_buffer)==0) {
>>>
>>>    // switch to write mode
>>> 4-   close reader
>>> 5-   open writer
>>>    for (1 to max_buffer)  {
>>> 6-      write
>>>     }
>>> 7- close writer
>>> }
>>>
>>>
>>> with this kind of method,
>>> 1 -
>>>  with a buffer of 100 doc, you divide the number of switching mode
>>> (writ/read) to 100 , and the indexing is much much faster
>>> 2- the merge factor is really useful because the indexwriter index
>>> more than 1 document
>>>
>>>
>>> i've developped a Index component with 2 implemenations
>>> 1 indexerDefault with this kind of method
>>> 2- MultiThreadIndexer optimized for multiple CPU
>>>
>>> maybe it  could be interesting to integred this components to the
>>> lucene Block
>>>
>>> Nicolas Maisonneuve
>>>
>>>
>> --------------------------------------------------------
>>
>>                    If email from this address is not signed
>>                                  IT IS NOT FROM ME
>>
>>                          Always check the label, folks !!!!!
>> --------------------------------------------------------
>>
>>
>>
>
>
--------------------------------------------------------

                   If email from this address is not signed
                                 IT IS NOT FROM ME

                         Always check the label, folks !!!!!
--------------------------------------------------------


Mime
View raw message