lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Wu <jasonha...@gmail.com>
Subject Lucene Indexing performance issue
Date Wed, 22 Oct 2014 16:14:50 GMT
Hi Team,

I am a new user of Lucene 4.8.1. I encountered a Lucene indexing
performance issue which slow down my application greatly. I tried several
ways from google searchs but still couldn't resolve it. Any suggestions
from your experts might help me a lot.

One of my application uses the lucene index for fast data searching. When I
start my application, I will index all the necessary data from database
which will be 88 MB index data after indexing is done. In this case,
indexing only takes less than 4 minutes.

I have another shell script task running every night, which send a JMX call
to my application to re-indexing all the data. The re-indexing method will
clear my current indexing directory data, reading data from database and
recreating the index from the ground. Everything works fine at the
beginning, indexing only takes a little more than 3 mins. But after my
application running for a while(one day or two), the re-indexing speed
slows down greatly which now takes more than 22 mins.

Here is the procedure of my Lucene indexing and re-indexing:

   1. If index data exists inside index directory, remove all the index
   data.
   2. Create IndexWriter with 200MB RAMBUFFERSIZE, (6.6) MaxMergesAndThreads
   3. Process DB result set
   - When I loop the result set, I reuse the same Document instance.
      - At the end of each loop, I call indexWriter.addDocument(doc)
   4. IndexWriter.commit()
   5. IndexWriter.close();


I did a profiling when it was slow and found out that
indexWriter.addDocument method took most of the time. Then, i put some
logging code as below:

long start = System.currentTimeMillis();
indexWriter.addDocument(doc);
totalAddDocTime += (System.currentTimeMillis() - start);

After several tests, when the indexing is slow down, the total time took by
indexWriter.addDocument(doc) is about 20 mins.

During indexing, i also observed the cpu usage sometimes above 100.

6G memory assigned to my application. When indexing, other processing
modules are all suspended waiting for indexing finish and I don't see any
memory leak in my application.

Can you give me some suggestions about my issue?

Thank you,

Jason

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message