cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Maisonneuve <n.maisonne...@gmail.com>
Subject luceneINdexTransformer not optimized
Date Mon, 15 Nov 2004 23:34:01 GMT
the method to update a document is not optimized (reindexDocument
method). this actual behavior is :

1- open reader if not open (but in fact it's always closed because of line  3)
2-delete document
3-close reader
4-open writer
5- write index
6-close index 

(NOTE: with this behavior, the merge factor is useless because this
method index only one document for a opening of indexwriter)

- A optimization in lucene is to avoid to open and close  indexreader
and indexwriter a lot of times.

so i propose this simple optimization : 
1- open reader if not open
2- delete document
3-store lucene document in a buffer (Stack)

// flush the buffer 
if ((buffer % max_buffer)==0) {

   // switch to write mode
4-   close reader 
5-   open writer 
   for (1 to max_buffer)  {
6-      write
    }
7- close writer
}


with this kind of method, 
1 -
 with a buffer of 100 doc, you divide the number of switching mode
(writ/read) to 100 , and the indexing is much much faster
2- the merge factor is really useful because the indexwriter index
more than 1 document


i've developped a Index component with 2 implemenations
1 indexerDefault with this kind of method
2- MultiThreadIndexer optimized for multiple CPU 

maybe it  could be interesting to integred this components to the lucene Block 

Nicolas Maisonneuve

Mime
View raw message