lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Sturge <>
Subject java gc with a frequently changing index?
Date Wed, 25 Jul 2007 18:41:33 GMT

I am indexing a set of constantly changing documents. The change rate is 
moderate (about 10 docs/sec over a 10M document collection with a 6G 
total size) but I want to be  right up to date (ideally within a second 
but within 5 seconds is acceptable) with the index.

Right now I have code that adds new documents to the index and deletes 
old ones using updateDocument() in the 2.1 IndexWriter. In order to see 
the changes, I need to recreate the IndexReader/IndexSearcher every 
second or so. I am not calling optimize() on the index in the writer, 
and the mergeFactor is 10.

The problem I am facing is that java gc is terrible at collecting the 
IndexSearchers I am discarding. I usually have a 3msec query time, but I 
get gc pauses of 300msec to 3 sec (I assume is is collecting the 
"tenured" generation in these pauses, which is my old IndexSearcher)

I've tried "-Xincgc", "-XX:+UseConcMarkSweepGC -XX:+UseParNewGC" and 
calling System.gc() right after I close the old index without much luck 
(I get the pauses down to 1sec, but get 3x as many. I want < 25 msec 
pauses). So my question is, should I be avoiding reloading my index in 
this way? Should I keep a separate IndexReader (which only deletes old 
documents) and one for new documents? Is there a standard technique for 
a quickly changing index?



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message