lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chuck Williams <ch...@allthingslocal.com>
Subject [Fwd: Re: Time taken in Indexing when the index is already huge]
Date Tue, 05 Apr 2005 03:53:54 GMT
Goel, Nikhil writes (4/4/2005 7:14 PM):

>Hi, 
>
>   
>
>I have been using lucene-1.3.jar for quite some time and we are using another library
to store the index in DB. 
>
>When we started indexing  the writer.optimize used to take in the range of 600-800 milliseconds
to return but now our index has grown to huge proportion and its around 10 MB hence the writer.optimize
is taking around 30-40 seconds and it is not acceptable for our solution. I put the timings
on writer.optimize() and it's the one which takes most of this time. 
>
> 
>
>So I am just wondering if someone is facing the same problem in indexing the data when
the index is already huge or is there another way to manage such huge index.
>
> 
>
>Here is the simple code which we use to index the data. 
>
>IndexWriter writer = new IndexWriter(dbDirectory, new StandardAnalyzer(), false); //Create
an indexwriter
>
>writer.addDocument(doc); //doc is of type  org.apache.lucene.document.Document...
>
>writer.optimize(); //optimize is called on indexwriter..This is the one which takes most
of the time and is responsible for the delay.
>
>writer.close(); // indexwriter is closed
>  
>
Does this code imply you are optimizing after every new document is 
indexed?  10MB is actually a pretty small index.  Depending on your 
inflow of documents, you should be able to optimize maybe once a day, 
during your application's least busy period.  Your IndexSearcher can 
still search your documents effectively while the index is unoptimized.  
As a first step, try not optimizing at all.

Chuck

> 
>
> 
>
>The time taken by optimize call grows a lot when the index is of larger size. I tried
to look it up on Erik Hatcher and Otis Gospodnetić <http://www.manning.com/hatcher2#author#author>
 book too but everywhere it says Lucene is quite scalable and don't have trouble in indexing
even with huge data. Can anyone please provide  some insight into this?
>
> 
>
>Thanks.
>
>Nikhil
>
> 
>
> 
>
>
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message