lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Goel, Nikhil" <nikhil.g...@verizon.com>
Subject Time taken in Indexing when the index is already huge
Date Tue, 05 Apr 2005 02:14:20 GMT
Hi, 

   

I have been using lucene-1.3.jar for quite some time and we are using another library to store
the index in DB. 

When we started indexing  the writer.optimize used to take in the range of 600-800 milliseconds
to return but now our index has grown to huge proportion and its around 10 MB hence the writer.optimize
is taking around 30-40 seconds and it is not acceptable for our solution. I put the timings
on writer.optimize() and it's the one which takes most of this time. 

 

So I am just wondering if someone is facing the same problem in indexing the data when the
index is already huge or is there another way to manage such huge index.

 

Here is the simple code which we use to index the data. 

IndexWriter writer = new IndexWriter(dbDirectory, new StandardAnalyzer(), false); //Create
an indexwriter

writer.addDocument(doc); //doc is of type  org.apache.lucene.document.Document...

writer.optimize(); //optimize is called on indexwriter..This is the one which takes most of
the time and is responsible for the delay.

writer.close(); // indexwriter is closed

 

 

The time taken by optimize call grows a lot when the index is of larger size. I tried to look
it up on Erik Hatcher and Otis Gospodnetić <http://www.manning.com/hatcher2#author#author>
 book too but everywhere it says Lucene is quite scalable and don't have trouble in indexing
even with huge data. Can anyone please provide  some insight into this?

 

Thanks.

Nikhil

 

 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message