lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Sturge <tstu...@metaweb.com>
Subject java gc with a frequently changing index?
Date Wed, 25 Jul 2007 18:41:33 GMT
Hi,

I am indexing a set of constantly changing documents. The change rate is 
moderate (about 10 docs/sec over a 10M document collection with a 6G 
total size) but I want to be  right up to date (ideally within a second 
but within 5 seconds is acceptable) with the index.

Right now I have code that adds new documents to the index and deletes 
old ones using updateDocument() in the 2.1 IndexWriter. In order to see 
the changes, I need to recreate the IndexReader/IndexSearcher every 
second or so. I am not calling optimize() on the index in the writer, 
and the mergeFactor is 10.

The problem I am facing is that java gc is terrible at collecting the 
IndexSearchers I am discarding. I usually have a 3msec query time, but I 
get gc pauses of 300msec to 3 sec (I assume is is collecting the 
"tenured" generation in these pauses, which is my old IndexSearcher)

I've tried "-Xincgc", "-XX:+UseConcMarkSweepGC -XX:+UseParNewGC" and 
calling System.gc() right after I close the old index without much luck 
(I get the pauses down to 1sec, but get 3x as many. I want < 25 msec 
pauses). So my question is, should I be avoiding reloading my index in 
this way? Should I keep a separate IndexReader (which only deletes old 
documents) and one for new documents? Is there a standard technique for 
a quickly changing index?

Thanks,

Tim


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message