lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksander M. Stensby" <aleksander.sten...@integrasco.no>
Subject Re: Indexing very slow.
Date Mon, 03 Jul 2006 09:59:58 GMT
Ah, didnt see that, yeah, you should have something like

new IndexWriter..

	for each document, writer.add

writer.optimize()
writer.close()

batching it up will make it faster, yes

On Mon, 03 Jul 2006 11:43:03 +0200, Volodymyr Bychkoviak  
<vbychkoviak@i-hypergrid.com> wrote:

> Problem is hidden in these lines:
>  >       writer.optimize();
>  >       writer.close();
>
> You should keep one index writer open for all document additions and
> close it only after adding last document.
>
> Optimize() merges all index segments to single segment and as index
> grows it takes longer and longer. Optimizing has no impact on indexing
> so it should be performed once at the end of indexing to optimize index
> for searching.
>
> peter velthuis wrote:
>> When i start the program its fast.. about 10 docs per second. but
>> after about 15000 it slows down very much. Now it does 1 doc per
>> second and it is at nr# 40 000 after a whole night indexing. These are
>> VERY small docs with very little information.. THis is what and how i
>> index it:
>>
>>      Document doc = new Document();
>>                                 doc.add(new Field("field1", field1,
>> Field.Store.YES,
>>                        Field.Index.TOKENIZED));
>>                                 doc.add(new Field("field2", field2,
>> Field.Store.YES,
>>                        Field.Index.TOKENIZED));
>>                                 doc.add(new Field("field3", field3,
>> Field.Store.YES,
>>                        Field.Index.TOKENIZED));
>>                                 doc.add(new Field("field4", field4,
>> Field.Store.YES,
>>                        Field.Index.TOKENIZED));
>>                                 doc.add(new Field("field5", field5,
>> Field.Store.YES,
>>                        Field.Index.TOKENIZED));
>>                                doc.add(new Field("field6", field6,
>> Field.Store.YES,
>>                        Field.Index.TOKENIZED));
>>                                doc.add(new Field("contents",
>> contents, Field.Store.NO,
>>                        Field.Index.TOKENIZED));
>>
>>
>>
>> and this:
>>
>>
>>    String indexDirectory = "lucdex2";
>>
>>    private void indexDocument(Document document) throws Exception {
>>        Analyzer analyzer  = new StandardAnalyzer();
>>        IndexWriter writer = new IndexWriter(indexDirectory, analyzer,
>> false);
>>      //  writer.setUseCompoundFile(true);
>>        writer.addDocument(document);
>>        writer.optimize();
>>        writer.close();
>>
>>
>>
>> I read the data out mysql database.. but that cant be the problem..
>> since data is in memory.
>>
>> Also i use cygwin, when i try indexing on windows in a program like
>> netbeans or BlueJ it crashes windows after about 5000 docs. it sais
>> "beep" and a complete shutdown...
>>
>> Peter
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>



-- 
Aleksander M. Stensby
Software Developer
Integrasco A/S
aleksander.stensby@integrasco.no
Tlf.: +47 41 22 82 72

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message